How to generate a list of fixed duration windows and perform statistics with esProc

The Time field of a certain database table is time, and the Value field is the value that needs to be counted.

Time

Value

10:10:05

3

10:11:06

4

10:13:13

5

10:13:19

9

10:13:32

8

10:14:35

2

Now we need to divide the data into windows every minute, fill in the missing windows, and calculate 4 values for each window: start_value, the last record value of the previous window; The last value in this window; The minimum value of this window; The maximum value of this window. Replace the start_value of the first window with the value of the first record; If the statistical value of a certain window is missing, use the start_value of this window instead.

The calculation result of the above example is as follows:

start

end

start_value

end_value

min

max

10:10:00

10:11:00

3

3

3

3

10:11:00

10:12:00

3

4

4

4

10:12:00

10:13:00

4

4

4

4

10:13:00

10:14:00

4

8

5

9

10:14:00

10:15:00

8

2

2

2

SQL is difficult to directly implement it when it comes to generating time series, aligning by time series and grouping, retaining grouped subsets, and retrieving records from subsets based on relative positions.

esProc provides rich computational functions, including generating time series, aligning by sequence, retaining grouped subsets, and accessing by position:

https://try.esproc.com/splx?55o


 A

1

$select * from main.txt order by Time

2

=A1.run(Time=time@m(Time))

3

=list=periods@s(A2.min(Time),A2.max(Time),60)

4

=A2.align@a(list,Time)

5

=A4.new(list(#):start, elapse@s(start,60):end, sv=ifn(end_value[-1],~.Value):start_value, ifn(~.m(-1).Value, sv):end_value, ifn(~.min(Value),sv):min, ifn(~.max(Value),sv):max)

A1 Load data.

A2Change the Time field to a full minute, such as 10:10:05->10:10:00. The function run is used to modify records in a loop, and the function time can format the time, @m represents the full minute.

A3 Generate a time series of consecutive minutes based on the start and end time. The periods function is used to generate a time series, @s indicates that the interval unit is second.

Picture1png
A4: Align the data in time series, with each group consisting of the data of a window. Some windows are empty sets without corresponding records, such as 10:12:00. The align function can align records with a specified sequence, with the default being the first record in each group, @a represents retrieving all records. After alignment, it does not aggregate immediately, but retain the grouping subsets for further calculation.

Picture2png

A5: Generate a new two-dimensional table using the new function, with each group of data (each window) in A4 corresponding to one new record.

Picture3png

start is the starting time of the current window, list(#) represents taking members from the list by position, and # is the sequence number of the current window.

end is the end time of the current window, elapse@s(start,60) represents the time after adding 60 seconds to the starting time, and the function elapse is used for time addition and subtraction, @s represents the unit in seconds.

start_value is the last entry of the previous window, and sv=ifn(end_value[-1],~.Value)means to take the end_value of the previous window (explained later). If the value is null, then the default value of this window (the first entry) is taken; Finally, assign the variable sv to start_value. The function ifn returns the first non-null parameter. [-1] represents the previous record/previous group/previous window. ~ indicates the current record/current group/current window.

end_value is the last entry of this window, and ifn(~.m(-1).Value, sv)represents taking the value of the last record of this window. If the value is null, then sv is taken. The function m can take members in absolute positions, where m (-1) represents the last member and m(n) is the nth member, which can be abbreviated as (n). The previous list (#) is used in this way.

min is the minimum value of this window, and ifn(~.min(Value),sv) represents taking the minimum value of this window. If the value is null, then sv is taken. max is the maximum value of this window, and the code is similar.