From SQL to SPL: Fetch values from previous non-null value rows

The database table organisation_user_link stores the current state of the account, where dossier_created is the account creation time.

id

user_id

organisation_id

status_id

stopped_reason_id

dossier_created

1

3

73

2


2021-10-29 07:50:21

2

9

1199

4

5

2021-05-19 17:44:07

The database table organisation_user_link_status_history stores the change history of account status.

timestamp

user_id

organisation_id

status_id

stopped_reason_id

2024-03-11 12:05:30

3

73

1


2024-03-08 11:15:35

3

73

3


2024-03-05 13:25:40

3

73

4

3

2024-03-13 02:07:10

9

1199

1


2024-03-11 02:07:10

9

1199

2


Now, based on the specified query date, list the account status for each day within the period from today (March 14, 2024) to the query date (March 1, 2024). Requirement: Reasonably fill in the status of blank dates, for example, from today to the latest change date, it should be filled in as today's, and from the latest change date to the next nearest change date, it should be filled in as the latest’s; Supplement the account creation date; Finally, sort by account and date in reverse order.

date

user_id

organisation_id

status_id

stopped_reason_id

dossier_created

2024-03-14

3

73

2


2021-10-29 07:50:21

2024-03-14

9

1199

4

5

2021-05-19 17:44:07

2024-03-13

3

73

2


2021-10-29 07:50:21

2024-03-13

9

1199

1


2021-05-19 17:44:07

2024-03-12

3

73

2


2021-10-29 07:50:21

2024-03-12

9

1199

1


2021-05-19 17:44:07

2024-03-11

3

73

1


2021-10-29 07:50:21

2024-03-11

9

1199

2


2021-05-19 17:44:07

2024-03-10

3

73

1


2021-10-29 07:50:21

2024-03-10

9

1199

2


2021-05-19 17:44:07

2024-03-09

3

73

1


2021-10-29 07:50:21

2024-03-09

9

1199

2


2021-05-19 17:44:07

2024-03-08

3

73

3


2021-10-29 07:50:21

2024-03-08

9

1199

2


2021-05-19 17:44:07

2024-03-07

3

73

3


2021-10-29 07:50:21

2024-03-07

9

1199

2


2021-05-19 17:44:07

2024-03-06

3

73

3


2021-10-29 07:50:21

2024-03-06

9

1199

2


2021-05-19 17:44:07

2024-03-05

3

73

4

3

2021-10-29 07:50:21

2024-03-05

9

1199

2


2021-05-19 17:44:07

2024-03-04

3

73

4

3

2021-10-29 07:50:21

2024-03-04

9

1199

2


2021-05-19 17:44:07

2024-03-03

3

73

4

3

2021-10-29 07:50:21

2024-03-03

9

1199

2


2021-05-19 17:44:07

2024-03-02

3

73

4

3

2021-10-29 07:50:21

2024-03-02

9

1199

2


2021-05-19 17:44:07

2024-03-01

3

73

4

3

2021-10-29 07:50:21

2024-03-01

9

1199

2


2021-05-19 17:44:07

SQL

WITH RECURSIVE dates ( date ) AS (
    SELECT DATE('2024-03-01')
    UNION ALL
    SELECT DATE(date) + INTERVAL 1 DAY
    FROM dates
    WHERE DATE(DATE) < (NOW() - INTERVAL 1 DAY)
),
current_history_data_query AS (
    SELECT 
        current_history_data.*
    FROM (
        SELECT
           DATE(timestamp) AS date,
           user_id,
           organisation_id,
           status_id,
           stopped_reason_id,
           dossier_created,
           'history-data' AS src
         FROM (
           SELECT
               oulsh.user_id,
               oulsh.organisation_id,
               oulsh.timestamp,
               oulsh.status_id,
               oulsh.stopped_reason_id,
               oul.dossier_created,
               ROW_NUMBER() OVER (PARTITION BY oulsh.user_id, oulsh.organisation_id, DATE(oulsh.timestamp) ORDER BY oulsh.timestamp DESC) AS row_num
           FROM organisation_user_link_status_history AS oulsh
           INNER JOIN organisation_user_link AS oul ON oulsh.user_id = oul.user_id AND oulsh.organisation_id = oul.organisation_id
         ) AS numbered_rows
         WHERE row_num = 1 AND DATE(timestamp) != DATE(NOW())
        
         UNION ALL
        
         SELECT CURRENT_DATE AS date, oul.user_id, oul.organisation_id, oul.status_id, oul.stopped_reason_id, oul.dossier_created, 'current-data' AS src
         FROM organisation_user_link AS oul
    ) AS current_history_data
    ORDER BY DATE DESC
)
SELECT d.date, u.user_id, u.organisation_id,
  (
    SELECT status_id
    FROM current_history_data_query
    WHERE user_id = u.user_id
    AND organisation_id = u.organisation_id
    AND date >= d.date
    ORDER BY date ASC
    LIMIT 1
  ) AS status_id,
  (
    SELECT stopped_reason_id
    FROM current_history_data_query
    WHERE user_id = u.user_id
    AND organisation_id = u.organisation_id
    AND date >= d.date
    ORDER BY date ASC
    LIMIT 1
  ) AS stopped_reason_id,
  (
    SELECT dossier_created
    FROM current_history_data_query
    WHERE user_id = u.user_id
    AND organisation_id = u.organisation_id
    AND date >= d.date
    ORDER BY date ASC
    LIMIT 1
  ) AS dossier_created
FROM dates d
JOIN (SELECT DISTINCT user_id, organisation_id FROM organisation_user_link) u
ORDER BY d.date DESC, u.user_id;

SQL uses recursive subqueries to create date sequences, which have complex structures. It uses multi-layer nested queries and window functions to mark state changes, and then fills in blank date data with join statements, making the code cumbersome. SPL does not aggregate after grouping by accounts, but continues to calculate subsets of the groups. SPL provides functions for generating date sequences and records based on date sequences: https://try.esproc.com/splx?3Li


 A

1

$select date('2024-03-14') as date,user_id,organisation_id,status_id,stopped_reason_id,dossier_created from organisation_user_link.txt

2

$select timestamp as date,user_id,organisation_id,status_id,stopped_reason_id,null as dossier_created from organisation_user_link_status_history.txt order by date desc

3

=(A1|A2).group(user_id)

4

=A3.conj(~.news(periods@x(date,ifn(date[1],date("2024-02-29")),-1);date(~):date,user_id,organisation_id,status_id,stopped_reason_id,A3.dossier_created))

5

=A4.sort(-date,user_id)

A1, A2 Load data. The current status table has added a date field with a value of today's date. The historical status table has added the account creation date with an empty value.

A3 Merge current and historical status, group by account, but do not aggregate.

A4 Process each group of data: Generate a date sequence based on the dates of each record and the next record. If it is the last record, generate a date sequence with the day before the query date; Generate a new batch of records for each date sequence, where the creation date comes from the first record in this group. The function 'periods' generates a date sequence, @x represents not including the posterior endpoint. The function news generates a batch of new records for each record in the current group.

Question sourcehttps://stackoverflow.com/questions/78159534/fetch-values-from-previous-non-null-value-rows