"esProc SPL is much more convenient than DuckDB in operating multi-layer JSON data, particularly .."

lisongbo RaqForum 82 No.
1 Reply • 31 View • 4 Months ago

SPL Operates Multi-layer JSON Data Much More Conveniently than DuckDB

esProc SPL is much more convenient than DuckDB in operating multi-layer JSON data, particularly when preserving JSON hierarchy and performing complex calculations are required.

DuckDB’s ability to operate JSON is quite good. The read_json_auto() function can directly parse JSON to a table structure, allowing you to operate on multi-layer data directly:

SELECT order_id, order_date, json_extract(customer, '$.name') AS cusName,json_extract(customer, '$.city') AS cusCity FROM read_json_auto('orders.json')

SPL is simpler for such basic operations:

json(file("orders.json").read()).new(order_id, order_date,customer.name:cusname,customer.city:cuscity)

Using the dot (.) to directly access sublevel data is very intuitive.

For slightly complex calculations, such as determining the sales amount for the Electronics category in an order’s data, DuckDB requires expanding order_details, then filtering for category=‘Electronics’, and finally calculating SUM(price*quantity).

SELECT sum(od.quantity*od.price) amount
FROM read_json_auto('orders.json') AS o,
LATERAL UNNEST(o.order_details) AS t(od),
LATERAL UNNEST([od.product]) AS t(p)
WHERE p.category = 'Electronics'

To implement this calculation, SQL needs to associate the sub table with the primary table using an inner join for filtering. This is a bit roundabout, but not too complicated.

SPL, in contrast, can directly treat the sub table as a set for calculations.

json(file("order3.json").read()).conj(order_details).select(product.category=="Electronics").sum(quantity*price)

Just a single statement, without associations, and simpler logic shows that the advantage over DuckDB is clearer.

In more complex scenarios, such as filtering order details for the ‘Electronics’ category and excluding orders with amounts below $200, DuckDB SQL becomes difficult to write. You first need to expand the order_details and aggregate order amounts, then filter for eligible orders based on the aggregation result, and finally resort to nested queries or CTEs in order to preserve the integrity of the data structure. As the SQL becomes lengthy, it becomes less user-friendly for debugging. Using Lambda syntax can be simpler, but it is quite different from traditional SQL form.

SELECT
    o.order_id, 
    LIST_FILTER(o.order_details, x -> x.product.category = 'Electronics') AS order_details
FROM read_json_auto(orders.json') AS o
WHERE 
    ARRAY_LENGTH(LIST_FILTER(o.order_details, x -> x.product.category = 'Electronics')) > 0
    AND SUM(
        LIST_FILTER(o.order_details, x -> x.product.category = 'Electronics') -> 
            (x -> x.price * x.quantity)
    ) > 200;

The SPL code is still natural:

=A2.select(order_details.select@1(product.category=="Electronics") && order_details.sum(price*quantity)>200)

Still, a single statement, simply treating the sub table as a set is enough. No complex subqueries and Lambda syntax, direct referencing, filtering, and aggregation work well regardless of the number of layers. Moreover, SPL preserves the multi-layer structure of JSON without requiring complex SQL like GROUP BY and LATERAL UNNEST.

While DuckDB does operate JSON well, it still requires UNNEST and similar SQL structures, which becomes cumbersome as the data layer increases. In contrast, SPL can directly operate on multi-layer JSON structures, making filtering and aggregation convenient while preserving the original data hierarchy. Clearly, it is better suited for complex JSON computation scenarios.

SPL Official Website 👉 https://www.esproc.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.esproc.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/sxd59A8F2W

Youtube 👉 https://www.youtube.com/@esProc_SPL

Promote