"The first four fields of a certain database table can all be grouping fields. ID SPLIT CUST DATE .."

jinxing RaqForum 41 No.
38 View • 7 Months ago

How to perform secondary grouping based on conditions within the grouped subsets with esProc

The first four fields of a certain database table can all be grouping fields.

ID	SPLIT	CUST	DATE	AMOUNT
ID_1	SPLIT_YES	A	2024-05-01 00:00:00	100
ID_1	SPLIT_NO	A	2024-04-01 00:00:00	200
ID_1	SPLIT_YES	B	2024-03-01 00:00:00	50
ID_2	SPLIT_YES	A	2024-05-01 00:00:00	50
ID_2	SPLIT_NO	A	2024-04-01 00:00:00	300
ID_2	SPLIT_NO	B	2024-03-01 00:00:00	300
ID_3	SPLIT_YES	B	2024-04-01 00:00:00	90
ID_3	SPLIT_NO	B	2024-04-01 00:00:00	30
ID_3	SPLIT_NO	A	2024-04-01 00:00:00	10
ID_3	SPLIT_NO	A	2024-03-01 00:00:00	10

Now group by the first two fields ID and SPLIT, and then decide whether to continue grouping again based on the conditions: if there are no duplicate DATEs within the group, keep the record with the latest date in this group and replace AMOUNT with the sum of AMOUNTs in this group; If there are duplicate dates within the group, group the records of this group again by CUST, while retaining the record with the latest date in the current group, and also replace the AMOUNT with the sum of the AMOUNTs of the current group.

SQL grouping must aggregate immediately after grouping, and cannot retain the grouped subsets, resulting in complex code for indirect implementation. SPL can retain subsets for further calculation, and subsets can also continue to be grouped:

https://try.esproc.com/splx?44C

	A
1	$select * from test_table_mm.txt order by DATE desc
2	=A1.group(ID,SPLIT)
3	=A2.conj(if(~.icount(DATE)==~.count(), [~], ~.group(CUST)))
4	=A3.(~(1).run(AMOUNT=A3.~.sum(AMOUNT)))

A1： Load data, with records with recent dates listed first.

A2： Use the group function to group by the first two fields, but do not aggregate them. Each group is a set. The figure shows the first and fifth groups.

Picture1png

A3：(if(~.icount(DATE)==~.count(), [~], ~.group(CUST))), process each group of data, and if the dates in this group are not duplicated, return the current group, such as the first group; Otherwise, continue grouping the current group by CUST and return each group, which is the sets of a set, such as the fifth group. ~ indicates the current group. The function icount is used to count the number of unique members. The count function is used to count the number of members. [] is a set symbol, because ~ itself is a set, [~] is a set of sets. This processing can make the data structure consistent and facilitate unified processing. The figure shows the first and fifth groups.

Picture2png
conj(…), merge the processed groups of data, so that each group of data rises from a 2-layer set (set of sets) to a single-layer set. Among them, the original group 5, due to continued grouping, had two sets of sets, which were merged and raised to two sets:

Picture3png
A4： Unified processing of data for each group in A3: Take the first record of the current group and replace AMOUNT with the sum of AMOUNTs for the current group. (1) represents the first member of the set.