1.3 Normal statistical method

 

In a standard normal distribution, the range covered by three standard deviations (3σ, where σ represents one standard deviation) can encompass 99.7% of the data. Therefore, data points outside 3σ can be considered outliers.

One standard deviation encompasses 68% of the data.

Two standard deviations encompass 95% of the data.

Three standard deviations encompass 99.7% of the data.

The characteristics of a normal distribution determine that the data points outside a range of n standard deviations are uncommon and can be considered anomalies.

Select a threshold-mode anomaly function:

TA[tu,td](x)=max(x-tu, td-x,0)/(tu-td)

The method to calculate tu and td using X[-k]i is as follows:

a=avg(X[-k]i)
σ=std(X[-k]i)
tu=a+n*σ
td=a-n*σ

Where, a is the average of X[-k]i, σ is the standard deviation of X[-k]i, and n is a multiple of the standard deviation. Modifying n can adjust the size of tu and td.

The anomaly score od is calculated as follows:

od=max(xi-tu, td-xi,0)/(tu-td)

SPL routine:

A B
1 =data=file(“1Ddata.csv”).import@tci().to(100)
2 =n=3 /Multiple of standard deviation
3 =ldata=data.m(:100) /Learning data X[-k]i (interval of 100)
4 =xi=data(101) /xi
5 =a=A3.avg() /Average a
6 =sigma=sqrt(var@s(A3)) /Standard deviation σ
7 =td=a-n*sigma /Lower limit td
8 =tu=a+n*sigma /Upper limit tu
9 =od=max(xi-tu,td-xi,0)/(tu-td) /Anomaly score od

Adjusting n can adjust the values of tu and td. By default, n is set to 3.

Calculation result example:

Since xi falls between tu and td, the anomaly score is 0.