"Regardless of which of the aforementioned methods is used, the upper threshold tu and the lower .."

mars RaqForum 25 No.
1 View • 1 Months ago

1.6 Percentage threshold adjustment

compilation of industrial mathematics algorithms(29)

Regardless of which of the aforementioned methods is used, the upper threshold tu and the lower threshold td always lie within the learning interval. This presents a problem: when xi exceeds the maximum or falls below the minimum value within the learning interval, xi will inevitably be considered an anomaly.

However, we may sometimes want to avoid regarding xi as an anomaly when it only slightly exceeds the limits. The solution is simple: expand tu and contract td. This is achieved by adjusting the thresholds based on a percentage of the difference between tu and td. Let dst represent the desired percentage of that difference. The adjusted upper threshold tu’ and the adjusted lower threshold td’ are calculated as follows:

tu’=tu+dst*(tu-td)

td’=td-dst*(tu-td)

Theoretically, the range of dst is [-0.5, ∞). However, we typically need to widen the range between tu and td, implying that dst>0. If dst is excessively large, no anomalies will be detected. Therefore, dst is usually kept below 1, with a commonly used range of [0,1].

The method of adjusting thresholds using a percentage of the difference between the upper and lower thresholds is called the Percentage Threshold Adjustment method.

SPL routine:

Time series X is shown below:

4cc9262032cc5905d3949be0fbde449f_1671153413664100png

Apply the distance method to X to detect anomalies and calculate the anomaly score for each data point.
SPL routine:

	A	B
1	=file(C1).import@tci()	/Time series X
2	100	/Learning interval k
3	2	/Radius multiplier
4	0	/dst
5	=A1.(if(#<=A2,,Threshold(~[-A2:-1],“up”,A3)))	/Upper threshold
6	=A1.(if(#<=A2,,Threshold(~[-A2:-1],“down”,A3)))	/Lower threshold
7	=to(A2+1,A1.len())	/Valid X indices
8	=A1(A7)	/Valid X
9	=A5(A7)	/tu
10	=A6(A7)	/td
11	=A9.((d=~-A10(#),~+A4*d))	/tu’
12	=A10.((d=A9(#)-~,~-A4*d))	/td’
13	=A8.((a=max(~-A11(#),A12(#)-~,0),b=(A11(#)-A12(#)),if(a==0,0,if(b==0,1,a/b))))	/Anomaly score Od

Parameter settings:

Learning interval k=100;

Radius multiplier for the distance method: n=2;

Threshold difference percentage dst=0.

a62d6d93605f32279f67825285840a08_1671153413789100png

In the figure, Value represents the X values, Value_up the tu sequence, Value_low the td sequence, and warn the anomaly score Od. Since the first 100 points comprise the learning interval, they have no associated anomaly scores. Consequently, the time series in the figure has a length of 1900.

The figure shows that there are too many anomalous time points. Intuitively, detecting so many anomalies seems unnecessary. Adjusting dst can prevent the detection of points with low anomaly scores.

Set dst=0.3

de483341589683c99bab87d00f945b5b_1671153413868100png