6.18 Order-based grouping: by the neighboring condition – big data
Compute each of the conditional expressions on a big data table and create a new table if the computing result is true.
We have a large log file where logs are output according to datetime. The task is to find the date when the ERROR log level appears the most.
| Date | Time | Level | IP | … |
|---|---|---|---|---|
| 2020/1/1 | 0:00:01 | INFO | 166.253.153.234 | … |
| 2020/1/1 | 0:00:02 | INFO | 99.72.133.239 | … |
| 2020/1/1 | 0:00:04 | WARN | 99.11.105.39 | … |
| 2020/1/1 | 0:00:05 | INFO | 117.69.80.195 | … |
| 2020/1/1 | 0:00:11 | INFO | 79.195.137.228 | … |
| … | … | … | … | … |
SPL offers @i option to work with cs.group() function to group a huge number of records, during which it creates a new group whenever the next neighboring value in the grouping field changes.
SPL script:
| A | |
|---|---|
| 1 | =file(“ServerLog.txt”).cursor@t() |
| 2 | =A1.group@i(Date[-1] !=Date||Level[-1]!=Level;Date,Level,count(~):Count) |
| 3 | =A2.select(Level:“ERROR”) |
| 4 | =A3.top(1;ErrorCount) |
A1 Create cursor for the log file.
A2 Use @i option in cs.group() function to perform grouping where it generates a new group whenever the condition changes.
A3 Get groups of log level ERROR.
A4 Get the group containing the largest number of continuous ERROR level.
Execution result:
| Date | ErrorCount |
|---|---|
| 2020/01/02 | 4 |
SPL Official Website 👉 https://www.esproc.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.esproc.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/sxd59A8F2W
Youtube 👉 https://www.youtube.com/@esProc_SPL