From SQL to SPL: Conditional grouping
A certain database table has multiple fields that can be grouped.
ID |
SPLIT |
CUST |
DATE |
AMOUNT |
ID_1 |
SPLIT_YES |
A |
2024-05-01 00:00:00 |
100 |
ID_1 |
SPLIT_NO |
A |
2024-04-01 00:00:00 |
200 |
ID_1 |
SPLIT_YES |
B |
2024-03-01 00:00:00 |
50 |
ID_2 |
SPLIT_YES |
A |
2024-05-01 00:00:00 |
50 |
ID_2 |
SPLIT_NO |
A |
2024-04-01 00:00:00 |
300 |
ID_2 |
SPLIT_NO |
B |
2024-03-01 00:00:00 |
300 |
ID_3 |
SPLIT_YES |
B |
2024-04-01 00:00:00 |
90 |
ID_3 |
SPLIT_NO |
B |
2024-04-01 00:00:00 |
30 |
ID_3 |
SPLIT_NO |
A |
2024-04-01 00:00:00 |
10 |
ID_3 |
SPLIT_NO |
A |
2024-03-01 00:00:00 |
10 |
Now we need to perform conditional grouping: group by the first two fields ID and SPLIT. If there are no duplicate DATEs within the group, keep the record with the latest date in this group and replace AMOUNT with the sum of AMOUNTs in this group; If there are duplicate dates within the group, group the records in this group again by CUST, while retaining the record with the latest date in the current group, and replace the AMOUNT with the sum of the AMOUNTs in the current group.
ID |
SPLIT |
CUST |
DATE |
AMOUNT |
ID_1 |
SPLIT_NO |
A |
2024-04-01 00:00:00 |
200 |
ID_1 |
SPLIT_YES |
A |
2024-05-01 00:00:00 |
150 |
ID_2 |
SPLIT_NO |
A |
2024-04-01 00:00:00 |
600 |
ID_2 |
SPLIT_YES |
A |
2024-05-01 00:00:00 |
50 |
ID_3 |
SPLIT_NO |
A |
2024-04-01 00:00:00 |
20 |
ID_3 |
SPLIT_NO |
B |
2024-04-01 00:00:00 |
30 |
ID_3 |
SPLIT_YES |
B |
2024-04-01 00:00:00 |
90 |
SQL:
SELECT id, split,cust,date_column,
CASE num_cust WHEN 1 THEN total_amount ELSE total_cust_amount END AS amount
FROM (
SELECT t.*,
COUNT(DISTINCT CASE rnk WHEN 1 THEN cust END)
OVER (PARTITION BY id, split) AS num_cust
FROM (
SELECT t.*,
DENSE_RANK() OVER (PARTITION BY id, split ORDER BY date_column DESC) AS rnk,
SUM(amount) OVER (PARTITION BY id, split) AS total_amount,
SUM(amount) OVER (PARTITION BY id, split, cust) AS total_cust_amount
FROM test_table_mm t
) t
WHERE rnk = 1
)
SQL cannot retain grouped subsets and requires the use of multiple nested queries and multiple window functions for indirect implementation, resulting in complex code. After SPL grouping, there is no need to aggregate immediately. The subsets can be kept for further calculation, and subsets can also continue to be grouped:
https://try.esproc.com/splx?44C
A |
|
1 |
$select * from test_table_mm.txt order by DATE desc |
2 |
=A1.group(ID,SPLIT) |
3 |
=A2.conj(if(~.icount(DATE)==~.count(), [~], ~.group(CUST))) |
4 |
=A3.(~(1).run(AMOUNT=A3.~.sum(AMOUNT))) |
A1: Load data and sort it in reverse order by date.
A2: Group by the first two fields, but do not aggregate.
A3: Process each group of data: If the date is not duplicated, return the current group; otherwise, group the current group by CUST and return each group.
A4: Continue processing data of each group: Take the first record of the current group and replace AMOUNT with the sum of AMOUNTs of the current group.
Question source:https://stackoverflow.com/questions/78048701/sql-oracle-conditional-group-by
SPL Official Website 👉 https://www.esproc.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.esproc.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL