Performance Optimization Exercises Using TPC-H – Q20
Ⅰ SQL code and analysis
Below is the SQL query statement:
select
    s_name,s_address
from
    supplier,nation
where
    s_suppkey in (
        select
            ps_suppkey
        from
            partsupp
        where
            ps_partkey in (
                select
                    p_partkey
                from
                    part
                where
                    p_name like 'bisque%'
            )
            and ps_availqty > (
                select
                    0.5 * sum(l_quantity)
                from
                    lineitem
                where
                    l_partkey = ps_partkey
                    and l_suppkey = ps_suppkey
                    and l_shipdate >= date '1995-01-01'
                    and l_shipdate < date '1995-01-01' + interval '1' year
            )
    )
    and s_nationkey = n_nationkey
    and n_name = 'CHINA'
order by
    s_name;
The main query is simple, but the nested filtering condition is complex.
Ⅱ SPL solution
The query appears complex, but it becomes clear when divided into multiple steps:
1. Filter nation table according to the condition;
2. Filter part table according to the condition;
3. Take result of step 1 as the foreign key table to match and filter supplier table;
4. Take result of steps 2 and 3 as foreign key tables to match and filter partsupp table;
5. Take result of step 4 as the foreign key table and associate it with lineitem table, select PS_AVAILQTY field, and perform grouping and aggregation by L_PARTKEY field and L_SUPPKEY field. As mentioned previously, a subquery containing an association with the primary table according to the equivalence condition can be always rewritten as a statement of grouping operation and association with the primary table. Based on the grouping & aggregation result set, we then select the eligible L_SUPPKEY values.
6. Take result of step 5 as the foreign key table to match and filter supplier table.
The whole process is generating intermediate foreign key tables and use them to match and filter other tables.
| A | |
| 1 | =now() | 
| 2 | 1995-1-1 | 
| 3 | =elapse@y(A2,1) | 
| 4 | >partname="bisque" | 
| 5 | >nationname="CHINA" | 
| 6 | =file("nation.btx").import@b().select@1(N_NAME==nationname).N_NATIONKEY | 
| 7 | =file("part.ctx").open().cursor@m(P_PARTKEY;pos@h(P_NAME,partname)).fetch().keys@im(P_PARTKEY) | 
| 8 | =file("supplier.ctx").open().cursor@m(S_SUPPKEY,S_NAME,S_ADDRESS;S_NATIONKEY==A6).fetch().keys@im(S_SUPPKEY) | 
| 9 | =file("partsupp.ctx").open().cursor@m(PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY;A7.find(PS_PARTKEY),A8.find(PS_SUPPKEY)).fetch().keys@im(PS_PARTKEY,PS_SUPPKEY) | 
| 10 | =file("lineitem.ctx").open().cursor@m(L_PARTKEY,L_SUPPKEY,L_QUANTITY;L_SHIPDATE>=A2 && L_SHIPDATE<A3) | 
| 11 | =A10.join@i(L_PARTKEY:L_SUPPKEY,A9:PS_PARTKEY:PS_SUPPKEY,PS_AVAILQTY) | 
| 12 | =A11.groups@u(L_PARTKEY,L_SUPPKEY;PS_AVAILQTY,sum(L_QUANTITY):quantity) | 
| 13 | =A12.select(PS_AVAILQTY*2>quantity).id(L_SUPPKEY) | 
| 14 | =A8.join@im(S_SUPPKEY,A13:~) | 
| 15 | =A14.new(S_NAME,S_ADDRESS).sort@0(S_NAME) | 
| 16 | =interval@ms(A1,now()) | 
A14 uses join@im, where @i option deletes non-matching records and @m option enables using order-based merge to speed up computation when both S_SUPPKEY and A13 are ordered.
Ⅲ Further optimization
1. Optimization method
In this example, we will use the date-integer conversion optimization method explained in Q1, where linetime table’s L_SHIPDATE has been converted, and dimension table primary key numberization optimization method explained in Q2 – part table’s P_PARTKEY field, supplier table’s S_SUPPKEY field, partsupp table’s PS_PARTKEY field and PS_SUPPKEY field, and lineitem table’s L_PARTKEY field and L_SUPPKEY field have all been converted in the previous examples.
2. Code for data conversion
Copy nation_11.btx, part_19.ctx, supplier_16.ctx, partsupp_16.ctx and lineitem_19.ctx, and rename them nation_20.btx, part_20.ctx, supplier_20.ctx, partsupp_20.ctx and lineitem_20.ctx respectively.
3. Code after data conversion
First, we need to preload the dimension tables. Below is preloading code:
| A | |
| 1 | >env(nation, file("nation_20.btx").import@b()) | 
| 2 | >env(part, file("part_20.ctx").open().import()) | 
| 3 | >env(supplier, file("supplier_20.ctx").open().import()) | 
Before performing the query, we need to first run the preloading code to load the small dimension tables into memory.
Computing code:
| A | |
| 1 | =now() | 
| 2 | 1995-1-1 | 
| 3 | =days@o(elapse@y(A2,1)) | 
| 4 | =days@o(A2) | 
| 5 | >partname="bisque" | 
| 6 | >nationname="CHINA" | 
| 7 | =nation.select@1(N_NAME==nationname).N_NATIONKEY | 
| 8 | =part.@m(pos@h(P_NAME,partname)) | 
| 9 | =supplier.@m(S_NATIONKEY==A7) | 
| 10 | =file("partsupp_20.ctx").open().cursor@m(PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY;A8(PS_PARTKEY) && A9(PS_SUPPKEY)).fetch().keys@im(PS_PARTKEY,PS_SUPPKEY) | 
| 11 | =file("lineitem_20.ctx").open().cursor@m(L_PARTKEY,L_SUPPKEY,L_QUANTITY;L_SHIPDATE>=A4 && L_SHIPDATE<A3) | 
| 12 | =A11.join@i(L_PARTKEY:L_SUPPKEY,A10:PS_PARTKEY:PS_SUPPKEY,PS_AVAILQTY) | 
| 13 | =A12.groups@u(L_PARTKEY,L_SUPPKEY;PS_AVAILQTY,sum(L_QUANTITY):quantity) | 
| 14 | =A13.select(PS_AVAILQTY*2>quantity).run(A9(L_SUPPKEY)=null) | 
| 15 | =supplier(A9.pselect@a(~==null)) | 
| 16 | =A15.new(S_NAME,S_ADDRESS).sort@0(S_NAME) | 
| 17 | =interval@ms(A1,now()) | 
A14 first select eligible records and then sets values of A9’s members corresponding to each record’s L_SUPPKEY as null. A15 first gets row numbers of all A9’s members having null values and then selects corresponding records from supplier table according to these row numbers.
Ⅳ Using enterprise edition’s column-wise computation
1. Original data
| A | |
| 1 | =now() | 
| 2 | 1995-1-1 | 
| 3 | =elapse@y(A2,1) | 
| 4 | >partname="bisque" | 
| 5 | >nationname="CHINA" | 
| 6 | =file("nation.btx").import@b().select@1(N_NAME==nationname).N_NATIONKEY | 
| 7 | =file("part.ctx").open().cursor@mv(P_PARTKEY;pos@h(P_NAME,partname)).fetch().keys@im(P_PARTKEY) | 
| 8 | =file("supplier.ctx").open().cursor@mv(S_SUPPKEY,S_NAME,S_ADDRESS;S_NATIONKEY==A6).fetch().keys@im(S_SUPPKEY) | 
| 9 | =file("partsupp.ctx").open().cursor@mv(PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY;A7.find(PS_PARTKEY),A8.find(PS_SUPPKEY)).fetch().keys@im(PS_PARTKEY,PS_SUPPKEY) | 
| 10 | =file("lineitem.ctx").open().cursor@mv(L_PARTKEY,L_SUPPKEY,L_QUANTITY;L_SHIPDATE>=A2 && L_SHIPDATE<A3) | 
| 11 | =A10.join@i(L_PARTKEY:L_SUPPKEY,A9:PS_PARTKEY:PS_SUPPKEY,PS_AVAILQTY) | 
| 12 | =A11.groups@u(L_PARTKEY,L_SUPPKEY;PS_AVAILQTY,sum(L_QUANTITY):quantity) | 
| 13 | =A12.select@mv(PS_AVAILQTY*2>quantity).id(L_SUPPKEY) | 
| 14 | =A8.join@im(S_SUPPKEY,A13:~) | 
| 15 | =A14.new@m(S_NAME,S_ADDRESS).sort@o(S_NAME) | 
| 16 | =interval@ms(A1,now()) | 
2. Optimized data
First, we need to preload the dimension tables. Below is preloading code:
| A | |
| 1 | >env(nation, file("nation_20.btx").import@bv()) | 
| 2 | >env(part, file("part_20.ctx").open().import@v()) | 
| 3 | >env(supplier, file("supplier_20.ctx").open().import@v()) | 
Before performing the query, we need to first run the preloading code to load the small dimension tables into memory.
Computing code:
| A | |
| 1 | =now() | 
| 2 | 1995-1-1 | 
| 3 | =days@o(elapse@y(A2,1)) | 
| 4 | =days@o(A2) | 
| 5 | >partname="bisque" | 
| 6 | >nationname="CHINA" | 
| 7 | =nation.select@1(N_NAME==nationname).N_NATIONKEY | 
| 8 | =part.(pos@h(P_NAME,partname)) | 
| 9 | =supplier.(S_NATIONKEY==A7) | 
| 10 | =file("partsupp_20.ctx").open().cursor@mv(PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY;A8(PS_PARTKEY) && A9(PS_SUPPKEY)).fetch().keys@im(PS_PARTKEY,PS_SUPPKEY) | 
| 11 | =file("lineitem_20.ctx").open().cursor@mv(L_PARTKEY,L_SUPPKEY,L_QUANTITY;L_SHIPDATE>=A4 && L_SHIPDATE<A3) | 
| 12 | =A11.join@i(L_PARTKEY:L_SUPPKEY,A10:PS_PARTKEY:PS_SUPPKEY,PS_AVAILQTY) | 
| 13 | =A12.groups@u(L_PARTKEY,L_SUPPKEY;PS_AVAILQTY,sum(L_QUANTITY):quantity) | 
| 14 | =A13.select@mv(PS_AVAILQTY*2>quantity).run@m(A9(L_SUPPKEY)=null) | 
| 15 | =supplier(A9.pselect@a(~==null)) | 
| 16 | =A15.new(S_NAME,S_ADDRESS).sort@0(S_NAME) | 
| 17 | =interval@ms(A1,now()) | 
Ⅴ Test result
Unit: Second
| Regular | Column-wise | |
| Before optimization | 10.6 | 5.9 | 
| After optimization | 8.8 | 3.9 | 
SPL Official Website 👉 https://www.esproc.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.esproc.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/sxd59A8F2W
Youtube 👉 https://www.youtube.com/@esProc_SPL
 
            
        