From SQL to SPL: Getting the available ranges from two tables

There are two tables in a certain database. The original inventory table data_add stores multiple batches of inventory for multiple items. Each batch of inventory has a starting number START_NUM and an ending number END_NUM, representing the range of consecutive numbers.

ID

ITEM_ID

START_NUM

END_NUM

1

337

101

400

2

337

500

800

3

337

801

1200

4

337

1500

1600

5

337

15000

16000

6

337

20000

30000

7

444

20

30

The consumption table data_cons stores multiple batches of consumption for multiple items, each batch of consumption also has a continuous number range.

ID

ITEM_ID

START_NUM

END_NUM

1

337

240

300

2

337

301

400

3

337

850

1100

4

337

1500

1510

5

337

15000

16000

Now we need to calculate the current inventory of each item, which is the original inventory minus consumption. Note that each batch is calculated separately, and the two consecutive batches are also calculated separately, such as the second and third records; The continuous numbering range of the original inventory may be consumed into discontinuous multi segment numbering, in which case multiple records need to be generated, with each record corresponding to a continuous numbering segment.

ITEM_ID

START_NUM

END_NUM

337

101

239

337

500

800

337

801

849

337

1101

1200

337

1511

1600

337

20000

30000

444

20

30

SQL:
with merged_cons(item_id, start_num, end_num) AS (
    SELECT * FROM data_cons
    MATCH_RECOGNIZE (
        PARTITION BY item_id
        ORDER BY start_num, end_num
        MEASURES FIRST(start_num) AS start_num, LAST(end_num) AS end_num
        PATTERN( merged* strt )
        DEFINE
            merged AS MAX(end_num) + 1 >= NEXT(start_num)
    )
)
, intersections(id, item_id, start_before, end_before, start_after, end_after) AS (
    SELECT a.id, a.item_id, /*a.start_num AS add_start, a.end_num AS add_end, 
        b.start_num AS cons_start, b.end_num AS cons_end, */
        CASE WHEN a.start_num < b.start_num - 1 THEN a.start_num END AS start_before,
        CASE WHEN a.start_num < b.start_num - 1 THEN b.start_num - 1 END AS end_before,
        CASE WHEN b.end_num + 1 < a.end_num THEN b.end_num + 1 END AS start_after,
        CASE WHEN b.end_num + 1 < a.end_num THEN a.end_num END AS end_after
    FROM data_add a
    JOIN merged_cons b
        ON a.item_id = b.item_id AND LEAST(a.end_num, b.end_num) >= GREATEST(a.start_num, b.start_num)
)
SELECT item_id, start_before as start_num, end_before as end_num
FROM intersections WHERE start_before IS NOT NULL
UNION ALL
SELECT item_id, start_after as start_num, end_after as end_num
FROM intersections WHERE start_after IS NOT NULL
UNION ALL
SELECT item_id, start_num, end_num
FROM data_add d
WHERE NOT EXISTS(SELECT 1 FROM intersections i WHERE i.id = d.id);

SQL requires the use of MATCH_RECOGNIZE statements and multiple subqueries to indirectly implement set calculation, which is complex in code and difficult to understand. SPL can use variables to represent sets and directly perform set related calculations:

https://try.esproc.com/splx?3xM


 A

B

1

$select * from data_add.txt

$select * from data_cons.txt

2

=A1.group(ITEM_ID;~.(to(START_NUM,END_NUM)):a)

=B1.group(ITEM_ID;~.conj(to(START_NUM,END_NUM)):b)

3

=A2.join(ITEM_ID,B2,b)


4

=A3.derive(a.([~, b].merge@d()).(~.group@i(~!=~[-1]+1)).conj():diff)

5

=A4.news(diff; ITEM_ID, ~1:START_NUM, ~.m(-1):END_NUM)

A1-B1: Load data.

A2: Group the original inventory by item and convert each batch within the group into a continuous sequence small set, without aggregation. B2: Perform similar calculation to the consumption table and merge multiple small sets within the group into one large set. ~ indicates the current group or current member.

A3: Left join.

A4: Add calculated column diff: Calculate the difference set between each small set of the original inventory and the large set of the consumption table, group each difference set by condition, and assign the consecutive sequence to the same group. The function group is used for grouping, and by default is equivalence grouping, @i represents grouping by condition, and ~[-1] represents the previous member. Function merge@d merges ordered sets and calculate the difference set.

A5: Generate a record for each group in the diff field of each A4 record. The function news can expand a record into multiple records. ~.m(-1) represents the last member in the sequence ~.

Question sourcehttps://stackoverflow.com/questions/78070892/getting-the-available-ranges-from-two-tables