A Field Guide to Querying and Report Computations with SPL - Basic regular operations
Basic regular operations
These tasks refer to basic operations such as queries, sorting, joins, grouping & aggregation, which can help you understand the basics of SPL. They are designed for beginners to get started quickly with SPL, while users already familiar with SPL may skip this section.
Example 1: List all unique departments
Data source: Employee table
Target: Remove duplicate Dept field values and retain the unique ones only.

SPL code:
A |
|
1 |
$select * from Employee.txt |
2 |
=A1.id(Dept) |
A1: Load data: use simple SQL to query Employee.txt and generate a SPL table sequence.

l Learning point: Table sequence
A table sequence is the fundamental data structure in SPL, similar to an SQL result set, as both are sets of multiple records. Their main difference is that an SQL result set is unordered, with no inherent sequence numbers for the records, whereas a table sequence is inherently ordered, with natural sequence numbers for its records. When no confusion arises, a table sequence is often simply referred to as a “table”. For example, calling A1 “table sequence Employee” or “table Employee” means the same thing.
In the above screenshot, A1’s Index column is the sequence number column, represented by#in the code.A1.(#)is used to retrieve the sequence number column (a set of sequence numbers): [1,2,3,….]
A1.(Name) gets Name column: ["Ashley","Rachel","Emily",…]
Table sequences make it easy to perform computations related to sequence numbers. For simple tasks like retrieving a record by its sequence number, SPL uses the syntax(N), similar to therownum=Nsyntax in SQL. For example, A1(2)is used in SPL to retrieve the second Employee record.

Another example is to get the first N records, which is written as to(N) in SPL, similar to the top()function in SQL. For example, A1.to(3) is used in SPL to get the first 3 records.

A2: Use id() function to perform distinct operation on the Dept field.

l Learning point: distinct operation
SPL providesid()function to remove duplicate field values, similar to theDISTINCTsyntax in SQL. Just as SQL’sGROUP BYcan achieve deduplication, SPL’s grouping & aggregation functiongroups()can also remove duplicates. Therefore, A2 can also be written asA1.groups(Dept).
To perform distinct based on two fields, State and Dept, write it as A1.groups(Dept,State).

Example 2: Add a computed column pointing to Employee records to Orders table
Data source: Orders table, Employee table
Target: Add a new field named newField to Orders table. The field’s values are corresponding records of the Employee table.

SPL code:
A |
|
1 |
$select * from Orders.txt |
2 |
$select * from Employee.txt |
3 |
=A1.derive(A2.select(EId==SellerId):newField) |
A1, A2: Load data.
A3: Add a computed column named newField. Traverse each Orders record to get the Employee record matching SellerId, and make it value of the current computed column. select()function performs a filtering operation; derive() function adds a computed column and returns a new table sequence.

Note that field values of A3’s table sequence can be records (references), which is different from a SQL result set.
l Learning point: Field values of a table sequence are generic
Field values of a table sequence are generic, which means they can be either simple types or references to records or other table sequences. This requires pre-establishing associations between tables, thereby reusing the associations or simplifying complex join computations. In contrast, the field values of an SQL result set can only be simple types and cannot reference records or other result sets.
For example, you can use A3’s association for further computations, like counting orders and summing order amounts of every client in each department:
=A3.groups(newField.Dept, Client; sum(Amount), count(1))
Note that table sequence Orders is referenced through variable name A1 and table sequence Employee is referenced through variable name A2. This differs from SQL result sets.
l Learning point: Table sequences are explicit sets
Table sequences are explicit sets that can be directly referenced by variables, simplifying complex multi-step computations. In contrast, SQL result sets are not explicit sets and cannot be directly referenced by variables – they can only be referenced as temporary tables using theFROMclause.
Example 3: Select certain orders and increase their amounts by 10%
Data source: Orders table
Target: Get order records of March 2022 and increase their amounts by 10%, without affecting other orders.

SPL code:
A |
|
1 |
$select * from Orders.txt |
2 |
=A1.select(string(OrderDate,"yyyy-MM")=="2022-03") |
3 |
=A2.run(Amount=Amount*1.1) |
A1: Load data.

A2: Select order data of the specified month. select() function performs a filtering operation.

l Learning point: filtering operation
Theselectfunction is used for conditional filtering, and its basic functionality covers theSQL WHEREclause. The logical operators include&&(AND),||(OR),!(NOT), etc., and the equality symbol in comparison operators is==.
A3: Modify data in A2 to increase Amount value by 10%. The run() function modifies data and returns the modified data. Below is A3’s result set, the modified A2:

Look at A1 and you will find that only order records of the specified month are modified.

Once A2 is modified, corresponding records in A1 are also changed, indicating a certain relationship between A2 and A1. A1 is the table sequence introduced earlier, and A2 is a reference to records in A1, referred to as a record sequence.
l Learning point: Record sequence
General operations on table sequences, such as sorting (sort), filtering (select), grouping (group), and joining (join), typically do not generate new table sequences. Instead, they derive a subset of references to records of the original table sequence, which is referred to as the record sequence. To simplify the description, the record sequence’s references to records are often simply called records, and the record sequence itself is simply referred to as a subset of the table sequence. In contrast, operations that alter the structure of a table sequence or a record sequence, such as creating a new table sequence (create), appending fields or computed columns (derive), and performing grouped aggregation (groups), generally generate new table sequences.

When modifying a record sequence, what is actually being modified is a subset of the table sequence, such as the records in A1 where OrderID = [10, 11, 12]. Correspondingly, the complement of this subset remains unchanged.
Example 4: Compute the total order amount and order count per department per year
Data source: Orders table, Employee table
Target: Join Orders table and Employee table, group the joining result set by year and department, compute the total order amount and order count in each group, and then sort records by year in ascending order and order amount in descending order. Orders without assigned salespeople (departments) should also be included in the summarization.

SPL code:
A |
B |
|
1 |
$select * from Orders.txt |
$select * from Employee.txt |
2 |
=join@1(A1:ord,SellerId; B1:emp,EId) |
|
3 |
=A2.groups(year(ord.OrderDate):y,emp.Dept;sum(ord.Amount):amt, count(1):cnt) |
|
4 |
=A3.sort(y,amt:-1) |
|
A1: Load data.
A2: Usejoin()function to perform a left join between Orders table and Employee table. The result is a new two-field table sequence, where the field values are references to the order records and employee records. Two of these records are shown below:

l Learning point: Association/Join
The join() function establishes an association between tables, similar to the SQL JOIN clause, which by default performs an inner join. @ is used to specify function options, where the following symbols or numbers indicate different extended functionalities. For example,@1represents a left join.
A3: Group records by year and department, and compute the total order amount and order count in each group.

l Learning point: Grouping & aggregation
SPL offers groups() function to perform grouping & aggregation and returns table sequence as the result. The function is similar to SQL GROUP BY syntax.
A4: Use sort() function to sort records by year in ascending order and by order amount in descending order.

l Learning point: Sorting operation
SPL provides sort()function to perform sorting operation, similar to SQL ORDER BY syntax. It sorts data, by default, in ascending order, and uses “:-1” to sort in descending order. If null values are not considered, sorting by the negative field values is equivalent to sorting in descending order. Therefore, A4 can also be written as=A3.sort(y, -amt).
Example 5: Respectively get records of employees in the specified states, those whose salaries are in the specified range, and those that are the intersection of the two
Data source: Employee table

SPL code:
A |
|
1 |
$select * from Employee.txt |
2 |
=A1.select(State=="New York" || State=="Texas") |
3 |
=A1.select(Salary>=5000 && Salary<=10000) |
4 |
=A2 ^ A3 |
A1: Load data.
A2: Get Employee records where State is New York or Texas. When performing the multi-value matching, you can also use thecontain()function in SPL, written as A1.select(["New York","Texas"].contain(State)).

A3: Get Employee records where the salary is within the range 5000-10000. You can also use SPL between function to perform the range matching, written as A1.select(between(Salary ,5000 :10000)).

A4: Compute intersection of A2 and A3.

l Learning point: set operations
SPL provides set operators, including ^ for intersection, & for union, \ for difference, and | for union all. The corresponding functions are isect(), union(), diff(), and conj(). Since SPL table sequences support explicit sets, there is no need to recompute to obtain those sets computed previously. This is different from SQL.
SPL Official Website 👉 https://www.esproc.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.esproc.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/sxd59A8F2W
Youtube 👉 https://www.youtube.com/@esProc_SPL