From SQL to SPL: Conditional grouping

A certain database table has multiple fields that can be grouped.

ID

SPLIT

CUST

DATE

AMOUNT

ID_1

SPLIT_YES

A

2024-05-01 00:00:00

100

ID_1

SPLIT_NO

A

2024-04-01 00:00:00

200

ID_1

SPLIT_YES

B

2024-03-01 00:00:00

50

ID_2

SPLIT_YES

A

2024-05-01 00:00:00

50

ID_2

SPLIT_NO

A

2024-04-01 00:00:00

300

ID_2

SPLIT_NO

B

2024-03-01 00:00:00

300

ID_3

SPLIT_YES

B

2024-04-01 00:00:00

90

ID_3

SPLIT_NO

B

2024-04-01 00:00:00

30

ID_3

SPLIT_NO

A

2024-04-01 00:00:00

10

ID_3

SPLIT_NO

A

2024-03-01 00:00:00

10

Now we need to perform conditional grouping: group by the first two fields ID and SPLIT. If there are no duplicate DATEs within the group, keep the record with the latest date in this group and replace AMOUNT with the sum of AMOUNTs in this group; If there are duplicate dates within the group, group the records in this group again by CUST, while retaining the record with the latest date in the current group, and replace the AMOUNT with the sum of the AMOUNTs in the current group.

ID

SPLIT

CUST

DATE

AMOUNT

ID_1

SPLIT_NO

A

2024-04-01 00:00:00

200

ID_1

SPLIT_YES

A

2024-05-01 00:00:00

150

ID_2

SPLIT_NO

A

2024-04-01 00:00:00

600

ID_2

SPLIT_YES

A

2024-05-01 00:00:00

50

ID_3

SPLIT_NO

A

2024-04-01 00:00:00

20

ID_3

SPLIT_NO

B

2024-04-01 00:00:00

30

ID_3

SPLIT_YES

B

2024-04-01 00:00:00

90

SQL:

SELECT id, split,cust,date_column,
       CASE num_cust WHEN 1 THEN total_amount ELSE total_cust_amount  END AS amount
FROM   (
  SELECT t.*,
         COUNT(DISTINCT CASE rnk WHEN 1 THEN cust END)
           OVER (PARTITION BY id, split) AS num_cust
  FROM   (
    SELECT t.*,
           DENSE_RANK() OVER (PARTITION BY id, split ORDER BY date_column DESC) AS rnk,
           SUM(amount) OVER (PARTITION BY id, split) AS total_amount,
           SUM(amount) OVER (PARTITION BY id, split, cust) AS total_cust_amount
    FROM   test_table_mm t
  ) t
  WHERE  rnk = 1
)

SQL cannot retain grouped subsets and requires the use of multiple nested queries and multiple window functions for indirect implementation, resulting in complex code. After SPL grouping, there is no need to aggregate immediately. The subsets can be kept for further calculation, and subsets can also continue to be grouped:

https://try.esproc.com/splx?44C


 A

1

$select * from test_table_mm.txt order by DATE desc

2

=A1.group(ID,SPLIT)

3

=A2.conj(if(~.icount(DATE)==~.count(), [~], ~.group(CUST)))

4

=A3.(~(1).run(AMOUNT=A3.~.sum(AMOUNT)))

A1 Load data and sort it in reverse order by date.

A2 Group by the first two fields, but do not aggregate.

A3 Process each group of data: If the date is not duplicated, return the current group; otherwise, group the current group by CUST and return each group.

A4 Continue processing data of each group: Take the first record of the current group and replace AMOUNT with the sum of AMOUNTs of the current group.

Question sourcehttps://stackoverflow.com/questions/78048701/sql-oracle-conditional-group-by