2.7 Distinct count
Perform distinct count at summarizing data in a table sequence. Now we are trying to find the most suitable field for acting as the primary key.
| PassengerId | Survived | Pclass | Name | Sex | Age |
|---|---|---|---|---|---|
| 1 | 0 | 3 | “Braund, Mr. Owen Harris” | male | 22 |
| 2 | 1 | 1 | “Cumings, Mrs. John Bradley” | female | 38 |
| 3 | 1 | 3 | “Heikkinen, Miss. Laina” | female | 26 |
| 4 | 1 | 1 | “Futrelle, Mrs. Jacques Heath” | female | 35 |
| 5 | 0 | 3 | “Allen, Mr. William Henry” | male | 35 |
| 6 | 0 | 3 | “Moran, Mr. James” | male | |
| 7 | 0 | 1 | “McCarthy, Mr. Timothy J” | male | 54 |
| … | … | … | … | … | … |
SPL script:
| A | |
|---|---|
| 1 | =T(“titanic_train.xlsx”) |
| 2 | =A1.fno().new(A1.fname(~):Name,A1.field(~).icount():DCount) |
| 3 | =A2.select(DCount==A1.len()) |
A2 Use icount() function to count the non-duplicate members in each field.
A3 Get the field where the distinct count result is equivalent to the length of all members in each value.
SPL Official Website 👉 https://www.esproc.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.esproc.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/sxd59A8F2W
Youtube 👉 https://www.youtube.com/@esProc_SPL