How to merge overlapping time intervals with esProc
A certain database table has multiple accounts, each with multiple time intervals that overlap.
account_id | start_date | end_date |
A | 2019-06-20 | 2019-06-29 |
A | 2019-06-25 | 2019-07-25 |
A | 2019-07-20 | 2019-08-26 |
A | 2019-12-25 | 2020-01-25 |
A | 2021-04-27 | 2021-07-27 |
A | 2021-06-25 | 2021-07-14 |
A | 2021-07-10 | 2021-08-14 |
A | 2021-09-10 | 2021-11-12 |
B | 2019-07-13 | 2020-07-14 |
B | 2019-06-25 | 2019-08-26 |
Now we need to merge the overlapping time intervals in each account to generate new intervals that do not overlap.
account_id | start_date | end_date |
A | 2019-06-20 | 2019-08-26 |
A | 2019-12-25 | 2020-01-25 |
A | 2021-04-27 | 2021-08-14 |
A | 2021-09-10 | 2021-11-12 |
B | 2019-06-25 | 2020-07-14 |
After SQL grouping, it is necessary to aggregate immediately, which makes it inconvenient to generate time series and perform set calculations between sequences. The indirect implementation code is very complex. SPL can retain grouped subsets for further calculation, providing functions for generating time series and calculating sets between sequences:
https://try.esproc.com/splx?3AG
A | |
1 | $select * from data.txt |
2 | =A1.group(account_id) |
3 | =A2.(~.(periods(start_date,end_date)).merge@u().group@i(~!=~[-1]+1) .new(A2.account_id,~1:start_date,~.m(-1):end_date)) |
4 | =A3.conj() |
A1: Load data.
A2: Group by account, but do not aggregate, each group is a set.
A3=A2.(~.(periods(start_date,end_date))…) Process each group of data in A2: first loop through each record in the current group, generate a sequence based on the start and end dates, and the result is a set of sequences. ~ represents the current group, and the period function can generate a time series. The following figure shows the sequence generated by the first record of the first group:
…merge@u() Continue processing: Combine the sets of sequences to generate a non-overlapping date sequence. The merge function is used for merging ordered data, where @u represents calculating the union during merging. The following figure shows the result of union of the first group, and it can be seen that the overlapping dates of the first two records have been merged.
...group@i(~!=~[-1]+1)Continue processing: Group ordered date sequences into new groups when the current member is not equal to the previous member plus 1 day, that is, grouping consecutive dates into the same group. Option @i represents conditional grouping of ordered data, and [-1] represents the previous member. After grouping the first group in an orderly manner, four groups were generated. The following figure shows the first two groups, and it can be seen that the dates between the groups are not consecutive.
…new(A2.account_id,~1:start_date,~.m(-1):end_date)Final processing: Generate a new two-dimensional table using groups, with each group corresponding to one record. The account_id is taken from the first record of the current group in A2, and the complete code is A2.~(1).account_id, abbreviated as A2.account_id; start_date is taken from the first member of the current group; end_date is taken from the last member of the current group. The function m can take members by position, where ~.m(1) represents the first member, abbreviated as ~(1) or ~1, and ~.m(-1) represents the last member. The following figure shows the two-dimensional table generated by the first group.
A4=A3.conj() Merge the members of each group in A3.
SPL Official Website 👉 https://www.esproc.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.esproc.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/sxd59A8F2W
Youtube 👉 https://www.youtube.com/@esProc_SPL