From SQL to SPL: Change duplicate contents to NULL
The first two columns of a certain database table have duplicate values, such as the 2nd-3rd records and the 1st record being duplicated below.
Column_A |
Column_B |
Column_C |
1 |
AB |
amount1 |
1 |
AB |
amount2 |
1 |
AB |
amount3 |
2 |
OA |
amount4 |
3 |
OE |
amount5 |
3 |
OE |
amount6 |
4 |
DB |
amount7 |
Now we need to change all duplicate values to null. In other words, after grouping by the first two fields (or one of them), only the first record in the group remains unchanged, and the first two fields of other records are changed to null.
Column_A |
Column_B |
Column_C |
1 |
AB |
amount1 |
amount2 |
||
amount3 |
||
2 |
OA |
amount4 |
3 |
OE |
amount5 |
amount6 |
||
4 |
DB |
amount7 |
SQL:
SELECT CASE a_rn WHEN 1 THEN column_a END AS column_a,
CASE b_rn WHEN 1 THEN column_b END AS column_b,
column_c
FROM (
SELECT column_a,
column_b,
column_c,
ROW_NUMBER() OVER (PARTITION BY column_a ORDER BY column_b, column_c)
AS a_rn,
ROW_NUMBER() OVER (PARTITION BY column_a, column_b ORDER BY column_c)
AS b_rn
FROM table_name)
SQL does not have natural row numbers, nor does it have row numbers within a group, and can only generate them using window functions, making the code relatively cumbersome. SPL has built-in row numbers, including row numbers within a group:
https://try.esproc.com/splx?3Wp
A |
|
1 |
$select * from table_name.txt |
2 |
=A1.group(Column_A).run(~.(if(#!=1,Column_A=Column_B=null))).conj() |
A1: Load data.
A2: Group by Column_A, modify the data of each group, and when the row number within the group is not 1, change Column_A and Column_B to null; Finally, union the data from each group. # indicates the row number within the group.
If the data amount is not large, it is also possible not to group and union, and directly compare the Column_A of the current record with all previous Column_As. If the latter includes the former, then set Column_A and Column_B to null. The code is as follows:
=A1.run(if(Column_A[:-1].contain(Column_A), (Column_A=Column_B=null)))
[: -1] represents the set from the first record to the previous record of the current record.
Question source:https://stackoverflow.com/questions/78053152/how-to-display-if-one-column-data-is-duplicate-and-i-want-to-display-in-next-row
SPL Official Website 👉 https://www.esproc.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.esproc.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL