A Field Guide to Querying and Report Computations with SPL - 6 Multilevel join operations

 

6 Multilevel join operations

This type of tasks are computations involving multi-table joins, mutual joins between two tables, or self joins that require high flexibility. SQL uses JOIN to implement those associations. But when multilevel relationship is involved, it usually needs the nested structure or recursive structure, where a table associated repeatedly needs an alias. The code, as a result, becomes complicated and error-prone.

Example 1: Find employees based on Pennsylvania and whose department managers come from Texas

Source data: Employee table, Department table (6/Department.txt). There are associations between the two tables.

Picture13png

SPL code:


A

B

1

$select * from Employee.txt

$select * from 6/Department.txt

2

>A1.switch(Dept,B1:Dept)

>B1.switch(Manager,A1:EId)

3

=A1.select(State=="Pennsylvania" && Dept.Manager.State=="Texas")

A1, B1: Load data.

A2:The switch() function replaces Employee table’s Dept field value with the corresponding Department table record.

Picture14png
 Learning point: switch() function
SPL switch()function changes a field value to the reference of record (reference, pointer) in a foreign key table (dictionary table or dimension table) and thus establishes association between the two tables. It bypasses the join clause and the reuse of association and directly uses the more intuitive object method (represented by dot) to access the foreign table field, significantly reducing the complexity of implementing flexible joins.

Picture15png
Parameter foreign: The field to be replaced, which is usually the fact table’s foreign key.
Parameter D: A foreign key table (dictionary table or dimension table). Increase the switch efficiency by setting the primary key for the table.
Parameter key: A field of the foreign key table, usually the primary key, for replacing the foreign field value.

B2: Change Department table’s Manager field value to the reference of corresponding record in Employee table.

Picture16png

A3: Select records of employees whose state is Pennsylvania and whose manager comes from Texas. Dept.Manager.State represents the state on which the current employee’s department manager is based. The syntax displays three levels of association, which begins from Employee table, goes to Department table, and finally returns to starting table.

Example 2: Compute total of each order’s amount and amount of all its child orders

Source data: ParentID field of the orders table (6\OrdersWithParent.txt) points to the parent order of the current one. Accordingly, the current order is called the child order. A tree self-join structure can be created through ParentID and ID.

Picture17png

SPL code:


A

1

$select OrderID,ParentID,Amount from 6\OrdersWithParent.txt

2

=A1.switch(ParentID,A1:OrderID)

3

=A1.new(OrderID,A1.nodes(ParentID,~).sum(Amount)+Amount:Total)

A1: Load data.

Picture18png
A2: Replace the ParentID field value with the table’s ID field to establish a self-join relationship. Below shows the order record where OrderID==9 and orders at its child levels:

Picture19png
A3: =A1.new(OrderID,A1.nodes(ParentID,)…) creates a new two-dimensional table based on A1. The statement first gets all child orders of the current order. ~ represents A1’s current record. The following screenshot shows multilevel child orders of the order record where OrderID==9. A1.nodes(ParentID,) gets all child orders of ~ from A1.

Picture20png
 Learning point: nodes() function
SPL nodes()function gets all child records based on the self-join table sequence the switch() function creates in advance. This avoids implementing a recursive structure manually and significantly reduces code complexity.

Picture21png

Parameter F: Field name at the parent node.

Parameter r: A specified record, which can be computed through the syntax table sequence.select@1 or table sequence (sequence number). To prevent excessive recursion depth, the nodes() function is equipped with a parameter that allows specifying the maximum recursion depth, which is 1000 by default.

In A3, =A1.new(...sum(Amount)+Amount:Total) computes sum of all child orders of the current order and adds amount of the current order.

Extended reading

How to calculate the total under recursive relationships with esProc

The “Female Manager’s Male Subordinates” Problem That Frustrates All BI Tools

From SQL to SPL: Calculate the hierarchy of recursive references