Process a large csv file with parallel processing
A csv file stores a large amount orders data.
OrderID,Client,SellerID,Amount,OrderDate 1,SPLI,219,9173,01/17/2022 2,HU,110,6192,10/01/2020 3,SPL,173,5659,04/23/2020 4,OFS,7,3811,02/05/2023 5,ARO,146,3752,08/27/2021 6,SRR,449,10752,05/27/2022 7,SJCH,326,11719,01/18/2022 8,JDR,3,11828,12/09/2021 |
Use Java to process this file: Find orders whose amounts are between 3,000 and 5,000, group them by customers, and sum order amounts and count orders.
Client | amt | cnt |
ARO | 11948382 | 2972 |
BDR | 11720848 | 2933 |
BON | 11864952 | 2960 |
BSF | 11947734 | 2980 |
CHO | 11806401 | 2968 |
CHOP | 11511201 | 2877 |
D | 11491452 | 2876 |
DSG | 11672114 | 2910 |
DSGC | 11656479 | 2918 |
Write the following SPL statement:
=file("d:/OrdersBig.csv").cursor@mtc(;8).select(Amount>=3000 && Amount<5000).groups(Client;sum(Amount):amt,count(1):cnt)
cursor() function parses a large file that cannot fit into the memory; by default, it performs the serial computation. @m option enables multithreaded data retrieval; 8 is the number of parallel threads; @t option enables importing the first line as column titles; and @c option enables using comma as the separator.
Read How to Call a SPL Script in Java to find how to integrate SPL into a Java application.
Source:https://stackoverflow.com/questions/70586145/how-to-read-a-specific-column-of-a-row-from-a-csv-file-in-java
SPL Official Website 👉 https://www.esproc.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.esproc.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/sxd59A8F2W
Youtube 👉 https://www.youtube.com/@esProc_SPL