Distinctive Positional Operations
In computers, sets are generally stored as arrays, and their members naturally have positions. Data tables, which are essentially sets of records, are also often stored as arrays, so their member records also have the concept of position. In practice, many analytical computations are indeed position-dependent. However, SQL treats data tables as unordered sets, thus leading to significant inconvenience.
esProc SPL preserves the ordered nature of arrays and provides a rich set of positional operation methods, enabling such computational requirements to be easily implemented with concise and understandable code.
1. Position index
In an ordered dataset, a record’s position inherently carries business significance. For example, if we want to obtain the data for the 5th, 10th, 15th, 20th, ... trading days of 2025 for a certain stock, a straightforward approach is to filter the record set for 2025 from the ordered stock data by trading day, and then select the 5th, 10th, 15th, 20th, ... records from the filtered set.
For unordered SQL data, window functions must be used to temporarily create a sequence number to accomplish such a task, which is a somewhat “roundabout” approach.
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (ORDER BY date) AS rn
FROM
stock
WHERE
YEAR(date) = 2025
)
WHERE
rn % 5 = 0;
SPL supports performing calculations on the position index of records, allowing the problem to be solved easily following the straightforward approach mentioned above:
stock.select(year(date)==2025).select(# % 5==0)
When looping through the dataset, # represents the position index of the current record. The condition for the second select is that the record’s position index # is divisible by 5, thus, the filtered results will naturally be the 5th, 10th, 15th, 20th, … records.
Python originates from conventional programming languages and inherits the concept of array member positions, making the corresponding code relatively concise:
stock_2025 = stock[stock['date'].dt.year == 2025].reset_index(drop=True)
selected_stock = stock_2025[stock_2025.index % 5 == 4]
The index is the default index of DataFrame object, and unlike SPL’s #, it is an integer starting from 0. Additionally, this index is an object and will occupy memory space.
The index is automatically calculated when generating the stock, and must be reset after filtering the stock by year to reindex the new set. Otherwise, stock_2025 index will still retain the position index before filtering by year.
In contrast, SPL’s # is a natural sequence number, requiring no additional memory space and no reset action, making it more concise and convenient.
Python also provides the iloc function, which enables directly retrieving data by position:
stock_2025 = stock[stock['date'].dt.year == 2025].reset_index(drop=True)
selected_stock = stock_2025.iloc[4::5]
iloc[4::5] means selecting a row of data every 5 rows starting from the integer position 4 (i.e., the 5th row).
SPL also provides a similar calculation method:
stock.select(year(date)==2025).step(5,5)
2. Retrieve positions based on conditions
Conditional filtering is a common operation, which means retrieving members that satisfy specific criteria. When a set’s members are ordered, we may also sometimes care about the positions of those members satisfying the criteria.
For example, we want to determine the number of trading days it took for a stock to rise above $100 after its listing. A straightforward approach is to find the position index of the first record exceeding $100 from the stock data ordered by trading days, which is the desired result.
SQL requires creating a sequence number and then calculating the minimum sequence number for records where the closing price exceeds $100; the approach remains ‘roundabout’.
SELECT MIN(rn)
FROM (
SELECT price,
ROW_NUMBER() OVER ( ORDER BY date ) rn
FROM stock)
WHERE price>100
SPL effortlessly accomplishes the task:
stock.pselect(price > 100)
pselect is a positioning function offered by SPL based on native position index, used to retrieve the positions of members that satisfy specified criteria. In this case, it identifies the position index of the first record exceeding $100.
Code written using DataFrame indexing in Python:
stock [stock['price'] > 100].index[0] + 1
It’s equivalent to retrieving the index after standard filtering, which is somewhat cumbersome.
Simply retrieving a position is often not the end goal; we may also perform further calculations based on that position. For example: calculate the increase when the stock price first rises above $100. The natural approach is to find the position of the first record where the closing price exceeds $100, then retrieve the closing prices at this position and the previous one, and finally subtract them.
SQL, however, would require a far more ‘roundabout’ approach:
WITH T AS (
SELECT price,ROW_NUMBER() OVER ( ORDER BY date ) rn,
price - LAG(price) OVER ( ORDER BY date) rising
FROM Stock)
SELECT Rising FROM T WHERE NO = ( SELECT MIN(rn) FROM T WHERE price>100 )
It requires calculating the increase of each day, and the intermediate table generated by the CTE syntax also needs to be traversed twice.
SPL can implement it with a natural approach, and the code is very concise:
p=stock.pselect(price > 4.5)
stock(p).price-stock(p-1).price
SPL offers the positioning calculation function calc, which can further simplify the code:
stock.calc(stock.pselect(price>100),price-price[-1])
The calc function calculates the expression price-price[-1] based on the positions obtained from stock.pselect(price>100). SPL supports cross-row calculation; price[-1] represents the previous row’s closing price.
Python has the concept of position, and similar code can also be written in Python:
first_index = stock[stock['price'] > 100].index[0]
result =stock.loc[first_index,'price']- stock.loc[first_index - 1,'price']
However, Python lacks a function akin to calc, thus loc must be written twice.
3. Retrieve the position of the maximum/minimum value
In an ordered dataset, the positions of the maximum and minimum records also hold significant business significance.
For example, calculate the increase on the day when the stock price first reaches its highest point. A straightforward approach is to identify the position of the first record that reaches the highest point in the ordered stock data. This makes it easy to solve the problem.
Writing in SQL is still quite cumbersome:
WITH T AS (
SELECT
price,
ROW_NUMBER() OVER (ORDER BY date) rn,
price - LAG(price) OVER (ORDER BY date) AS rising
FROM
Stock
),
SELECT rising
FROM T
WHERE price = (SELECT MAX(price) FROM T)
ORDER BY rn
LIMIT 1
It requires many window functions and multiple traversals, and finally, it also needs to sort before retrieving the first record.
SPL offers the positioning functions pmax and pmin to return the positions of maximum and minimum values, enabling exceptionally concise code:
stock.calc(stock.pmax(price ), price - price[-1])
By default, pmax searches from front to back for the position index of the first maximum value.
There may be multiple members with the maximum value. Suppose we want to determine the corresponding price increases for all members holding that maximum value; SPL code would look like this:
stock.calc(stock.pmax@a(price ), price - price[-1])
The @a option can return the positions of all these members.
Python’s idxmax function can also return the position of the first record corresponding to the maximum value:
max_idx = stock['price'].idxmax()
increase = stock.loc[max_idx,'price']- stock.loc[max_idx - 1,'price']
Like the previous code, loc has to be written twice.
To determine all increases corresponding to the maximum value, the Python code becomes rather verbose:
max_price = stock['price'].max()
max_idxs = stock[stock['price'] == max_price].index
increase=max_price - stock.loc[max_idxs-1,'price'].values
Because Python doesn’t offer a function to return the positions of all records holding the maximum value, it must first calculate the maximum value, then filter for the records where that maximum value is found, and finally use index to determine the positions of those records.
In summary: SQL is quite cumbersome when handling position-based calculations on ordered sets. It requires creating temporary sequence numbers, leading to roundabout logic and cumbersome code. Python, with its integer indexing, can also retrieve members by position, offering a significant improvement over SQL. However, its indexing lacks sufficient automation and often requires resetting. Furthermore, Python’s positional operations are not comprehensive, resulting in inconsistent implementation approaches for similar tasks. More complex tasks remain verbose, and understanding them can be challenging. SPL, on the other hand, provides a unique approach to positional operations. With its concise, inherent sequence numbers and a rich set of positioning functions, it can easily accomplish these types of tasks.
SPL Official Website 👉 https://www.esproc.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.esproc.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version