"A certain database table has multiple fields that can be grouped. ID SPLIT CUST DATE AMOUNT ID_1 .."

jinxing RaqForum 41 No.
23 View • 2 Months ago

From SQL to SPL: Conditional grouping

A certain database table has multiple fields that can be grouped.

ID	SPLIT	CUST	DATE	AMOUNT
ID_1	SPLIT_YES	A	2024-05-01 00:00:00	100
ID_1	SPLIT_NO	A	2024-04-01 00:00:00	200
ID_1	SPLIT_YES	B	2024-03-01 00:00:00	50
ID_2	SPLIT_YES	A	2024-05-01 00:00:00	50
ID_2	SPLIT_NO	A	2024-04-01 00:00:00	300
ID_2	SPLIT_NO	B	2024-03-01 00:00:00	300
ID_3	SPLIT_YES	B	2024-04-01 00:00:00	90
ID_3	SPLIT_NO	B	2024-04-01 00:00:00	30
ID_3	SPLIT_NO	A	2024-04-01 00:00:00	10
ID_3	SPLIT_NO	A	2024-03-01 00:00:00	10

Now we need to perform conditional grouping: group by the first two fields ID and SPLIT. If there are no duplicate DATEs within the group, keep the record with the latest date in this group and replace AMOUNT with the sum of AMOUNTs in this group; If there are duplicate dates within the group, group the records in this group again by CUST, while retaining the record with the latest date in the current group, and replace the AMOUNT with the sum of the AMOUNTs in the current group.

ID	SPLIT	CUST	DATE	AMOUNT
ID_1	SPLIT_NO	A	2024-04-01 00:00:00	200
ID_1	SPLIT_YES	A	2024-05-01 00:00:00	150
ID_2	SPLIT_NO	A	2024-04-01 00:00:00	600
ID_2	SPLIT_YES	A	2024-05-01 00:00:00	50
ID_3	SPLIT_NO	A	2024-04-01 00:00:00	20
ID_3	SPLIT_NO	B	2024-04-01 00:00:00	30
ID_3	SPLIT_YES	B	2024-04-01 00:00:00	90

SQL:

SELECT id, split,cust,date_column,
       CASE num_cust WHEN 1 THEN total_amount ELSE total_cust_amount  END AS amount
FROM   (
  SELECT t.*,
         COUNT(DISTINCT CASE rnk WHEN 1 THEN cust END)
           OVER (PARTITION BY id, split) AS num_cust
  FROM   (
    SELECT t.*,
           DENSE_RANK() OVER (PARTITION BY id, split ORDER BY date_column DESC) AS rnk,
           SUM(amount) OVER (PARTITION BY id, split) AS total_amount,
           SUM(amount) OVER (PARTITION BY id, split, cust) AS total_cust_amount
    FROM   test_table_mm t
  ) t
  WHERE  rnk = 1
)

SQL cannot retain grouped subsets and requires the use of multiple nested queries and multiple window functions for indirect implementation, resulting in complex code. After SPL grouping, there is no need to aggregate immediately. The subsets can be kept for further calculation, and subsets can also continue to be grouped:

https://try.esproc.com/splx?44C

	A
1	$select * from test_table_mm.txt order by DATE desc
2	=A1.group(ID,SPLIT)
3	=A2.conj(if(~.icount(DATE)==~.count(), [~], ~.group(CUST)))
4	=A3.(~(1).run(AMOUNT=A3.~.sum(AMOUNT)))

A1： Load data and sort it in reverse order by date.

A2： Group by the first two fields, but do not aggregate.

A3： Process each group of data: If the date is not duplicated, return the current group; otherwise, group the current group by CUST and return each group.

A4： Continue processing data of each group: Take the first record of the current group and replace AMOUNT with the sum of AMOUNTs of the current group.

Question source：https://stackoverflow.com/questions/78048701/sql-oracle-conditional-group-by

SPL Official Website 👉 https://www.esproc.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.esproc.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/sxd59A8F2W

Youtube 👉 https://www.youtube.com/@esProc_SPL

Application