SQLazy: Merging Overlapping Time Intervals

Problem description

A table stores multiple time intervals for each account, and the intervals may overlap.

Source data:

account_id

start_date

end_date

A

2019-06-20

2019-06-29

A

2019-06-25

2019-07-25

A

2019-07-20

2019-08-26

A

2019-12-25

2020-01-25

A

2021-04-27

2021-07-27

A

2021-06-25

2021-07-14

A

2021-07-10

2021-08-14

A

2021-09-10

2021-11-12

B

2019-07-13

2020-07-14

B

2019-06-25

2019-08-26

Expected output (merge overlapping intervals for each account):

account_id

start_date

end_date

A

2019-06-20

2019-08-26

A

2019-12-25

2020-01-25

A

2021-04-27

2021-08-14

A

2021-09-10

2021-11-12

B

2019-06-25

2020-07-14

Take account A as an example:

  • The first three intervals (6/20-6/29, 6/25-7/25, 7/20-8/26) overlap and can be merged into 6/20 - 8/26;

  • The nonoverlapping interval 12/25 - 1/25;

  • The overlapping intervals (4/27-7/27, 6/25-7/14, 7/10-8/14) can be merged into 4/27-8/14;

  • The nonoverlapping interval 9/10-11/12.

Apply the same merging operations to account B: merge any intervals that overlap.

Step-by-step implementation with SQLazy

Key approach: Check whether the current interval’s start date is later than the maximum end date among all previous intervals. If it is, the current interval does not overlap with any of the previous intervals and a new group should be started; otherwise, merge it into the current group.

Name

Anchor

Statement

t1

acc

sort account_id, start_date

t2


compute end_date[:-1] max as prev_max; partition account_id

t3


segment condition start_date > prev_max as gid; partition account_id

t4


summarize start_date min as start_date, end_date max as end_date; group account_id, gid



derive delete gid

[Run this example online]

Step 1: Sort rows by account and the start date

sort account_id, start_date

Sort rows by account_id and start_date in ascending order, ensuring intervals within each account are processed in chronological order.

..

Step 2: Calculate the maximum end date among all rows before the current row

compute end_date[:-1] max as prev_max; partition account_id

Within each account partition, calculate the maximum end_date value among all rows from the first row up to, but excluding, the current row, and record it as prev_max.

..

Step 3: Segment rows based on the condition and assign group numbers

segment condition start_date > prev_max as gid; partition account_id

Check each row in sequence: if start_date > prev_max, the current interval does not overlap with any of the previous intervals and a new group is started (gid+1); otherwise, the row is assigned to the current group.

..

Step 4: Aggregate by account and group number

summarize start_date min as start_date, end_date max as end_date; group account_id, gid

For each group, extract the earliest start date and the latest end date to form a merged interval.

..

Finally, remove the helper column gid.

Compile the steps into SQL

Once the above steps are complete and verified, SQLazy’s compiler can automatically generate the equivalent native SQL (using MySQL as an example):

WITH t2 AS (
    SELECT
        account_id,
        start_date,
        end_date,
        MAX(end_date) OVER (
            PARTITION BY account_id
            ORDER BY
                CASE WHEN account_id IS NULL THEN 1 ELSE 0 END,
                account_id ASC,
                CASE WHEN start_date IS NULL THEN 1 ELSE 0 END,
                start_date ASC
            ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
        ) AS prev_max
    FROM
        acc
),
t3 AS (
    SELECT
        account_id,
        start_date,
        end_date,
        prev_max,
        1 + SUM(
            CASE
                WHEN start_date > prev_max THEN 1
                ELSE 0
            END
        ) OVER (
            PARTITION BY account_id
            ORDER BY
                CASE WHEN account_id IS NULL THEN 1 ELSE 0 END,
                account_id ASC,
                CASE WHEN start_date IS NULL THEN 1 ELSE 0 END,
                start_date ASC
        ) AS gid
    FROM
        t2
)
SELECT
    account_id,
    MIN(start_date) AS start_date,
    MAX(end_date) AS end_date
FROM
    t3
GROUP BY
    account_id,
    gid
ORDER BY
    account_id,
    start_date;

You only need to verify the logic of each of the four steps – no need to understand or debug the SQL – and the compiler will generate the production-ready code.

Why SQLazy is more efficient


Conventional SQL (Manual coding)

SQLazy

Steps

Multi-level CTE + window functions

Four steps

Core technique

Need to understand “ROWS BETWEEN UNBOUNDED PRECEDING AND CASE WHEN CUMULATION”

Use [:-1] directly to represent “all preceding rows”, and use “segment” to represent the grouping condition

Readability

Low – nested window functions are difficult to understand

High – each step corresponds to a well-defined business action

Debug approach

Manually break down CTEs and run them repeatedly

Step-by-step execution with visible intermediate results at each step

Cross-database portability

Manually adapt code for different SQL dialects

The compiler automatically generates code in the target SQL dialect


SQLazy lets you describe logic using business language, rather than writing nested queries in SQL syntax. This example of “merge overlapping intervals” is expressed clearly in just 4 steps – sort, calculate the maximum end date among the preceding rows, segment rows by condition, and aggregate. The compiler generates the final SQL for you, and you only need to verify the business meaning of each step.

Try SQLazy online: sqlazy.com (Free to use, signup not required)
SQLazy project repository:
github.com/SPLWare/SQLazy