"The first two columns of a certain database table have duplicate values, such as the 2nd-3rd rec .."

jinxing RaqForum 41 No.
19 View • 2 Months ago

From SQL to SPL: Change duplicate contents to NULL

The first two columns of a certain database table have duplicate values, such as the 2nd-3rd records and the 1st record being duplicated below.

Column_A	Column_B	Column_C
1	AB	amount1
1	AB	amount2
1	AB	amount3
2	OA	amount4
3	OE	amount5
3	OE	amount6
4	DB	amount7

Now we need to change all duplicate values to null. In other words, after grouping by the first two fields (or one of them), only the first record in the group remains unchanged, and the first two fields of other records are changed to null.

Column_A	Column_B	Column_C
1	AB	amount1
		amount2
		amount3
2	OA	amount4
3	OE	amount5
		amount6
4	DB	amount7

SQL：

SELECT CASE a_rn WHEN 1 THEN column_a END AS column_a,
       CASE b_rn WHEN 1 THEN column_b END AS column_b,
       column_c
FROM   (
  SELECT column_a,
         column_b,
         column_c,
         ROW_NUMBER() OVER (PARTITION BY column_a ORDER BY column_b, column_c)
           AS a_rn,
         ROW_NUMBER() OVER (PARTITION BY column_a, column_b ORDER BY column_c)
           AS b_rn
  FROM   table_name)

SQL does not have natural row numbers, nor does it have row numbers within a group, and can only generate them using window functions, making the code relatively cumbersome. SPL has built-in row numbers, including row numbers within a group:

https://try.esproc.com/splx?3Wp

	A
1	$select * from table_name.txt
2	=A1.group(Column_A).run(~.(if(#!=1,Column_A=Column_B=null))).conj()

A1： Load data.

A2： Group by Column_A, modify the data of each group, and when the row number within the group is not 1, change Column_A and Column_B to null; Finally, union the data from each group. # indicates the row number within the group.

If the data amount is not large, it is also possible not to group and union, and directly compare the Column_A of the current record with all previous Column_As. If the latter includes the former, then set Column_A and Column_B to null. The code is as follows:

=A1.run(if(Column_A[:-1].contain(Column_A), (Column_A=Column_B=null)))

[: -1] represents the set from the first record to the previous record of the current record.

Question source：https://stackoverflow.com/questions/78053152/how-to-display-if-one-column-data-is-duplicate-and-i-want-to-display-in-next-row

SPL Official Website 👉 https://www.esproc.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.esproc.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/sxd59A8F2W

Youtube 👉 https://www.youtube.com/@esProc_SPL

Application