"Ⅰ SQL code and analysis Below is the SQL query statement: select s_name,s_address from supplier, .."

Jerry RaqForum 14 No.
481 View • 3 Years ago

Performance Optimization Exercises Using TPC-H – Q20

Ⅰ SQL code and analysis

Below is the SQL query statement:

select
    s_name,s_address
from
    supplier,nation
where
    s_suppkey in (
        select
            ps_suppkey
        from
            partsupp
        where
            ps_partkey in (
                select
                    p_partkey
                from
                    part
                where
                    p_name like 'bisque%'
            )
            and ps_availqty > (
                select
                    0.5 * sum(l_quantity)
                from
                    lineitem
                where
                    l_partkey = ps_partkey
                    and l_suppkey = ps_suppkey
                    and l_shipdate >= date '1995-01-01'
                    and l_shipdate < date '1995-01-01' + interval '1' year
            )
    )
    and s_nationkey = n_nationkey
    and n_name = 'CHINA'
order by
    s_name;

The main query is simple, but the nested filtering condition is complex.

Ⅱ SPL solution

The query appears complex, but it becomes clear when divided into multiple steps:

1. Filter nation table according to the condition;

2. Filter part table according to the condition;

3. Take result of step 1 as the foreign key table to match and filter supplier table;

4. Take result of steps 2 and 3 as foreign key tables to match and filter partsupp table;

5. Take result of step 4 as the foreign key table and associate it with lineitem table, select PS_AVAILQTY field, and perform grouping and aggregation by L_PARTKEY field and L_SUPPKEY field. As mentioned previously, a subquery containing an association with the primary table according to the equivalence condition can be always rewritten as a statement of grouping operation and association with the primary table. Based on the grouping & aggregation result set, we then select the eligible L_SUPPKEY values.

6. Take result of step 5 as the foreign key table to match and filter supplier table.

The whole process is generating intermediate foreign key tables and use them to match and filter other tables.

	A
1	=now()
2	1995-1-1
3	=elapse@y(A2,1)
4	>partname="bisque"
5	>nationname="CHINA"
6	=file("nation.btx").import@b().select@1(N_NAME==nationname).N_NATIONKEY
7	=file("part.ctx").open().cursor@m(P_PARTKEY;pos@h(P_NAME,partname)).fetch().keys@im(P_PARTKEY)
8	=file("supplier.ctx").open().cursor@m(S_SUPPKEY,S_NAME,S_ADDRESS;S_NATIONKEY==A6).fetch().keys@im(S_SUPPKEY)
9	=file("partsupp.ctx").open().cursor@m(PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY;A7.find(PS_PARTKEY),A8.find(PS_SUPPKEY)).fetch().keys@im(PS_PARTKEY,PS_SUPPKEY)
10	=file("lineitem.ctx").open().cursor@m(L_PARTKEY,L_SUPPKEY,L_QUANTITY;L_SHIPDATE>=A2 && L_SHIPDATE<A3)
11	=A10.join@i(L_PARTKEY:L_SUPPKEY,A9:PS_PARTKEY:PS_SUPPKEY,PS_AVAILQTY)
12	=A11.groups@u(L_PARTKEY,L_SUPPKEY;PS_AVAILQTY,sum(L_QUANTITY):quantity)
13	=A12.select(PS_AVAILQTY*2>quantity).id(L_SUPPKEY)
14	=A8.join@im(S_SUPPKEY,A13:~)
15	=A14.new(S_NAME,S_ADDRESS).sort@0(S_NAME)
16	=interval@ms(A1,now())

A14 uses join@im, where @i option deletes non-matching records and @m option enables using order-based merge to speed up computation when both S_SUPPKEY and A13 are ordered.

Ⅲ Further optimization

1. Optimization method

In this example, we will use the date-integer conversion optimization method explained in Q1, where linetime table’s L_SHIPDATE has been converted, and dimension table primary key numberization optimization method explained in Q2 – part table’s P_PARTKEY field, supplier table’s S_SUPPKEY field, partsupp table’s PS_PARTKEY field and PS_SUPPKEY field, and lineitem table’s L_PARTKEY field and L_SUPPKEY field have all been converted in the previous examples.

2. Code for data conversion

Copy nation_11.btx, part_19.ctx, supplier_16.ctx, partsupp_16.ctx and lineitem_19.ctx, and rename them nation_20.btx, part_20.ctx, supplier_20.ctx, partsupp_20.ctx and lineitem_20.ctx respectively.

3. Code after data conversion

First, we need to preload the dimension tables. Below is preloading code:

	A
1	>env(nation, file("nation_20.btx").import@b())
2	>env(part, file("part_20.ctx").open().import())
3	>env(supplier, file("supplier_20.ctx").open().import())

Before performing the query, we need to first run the preloading code to load the small dimension tables into memory.

Computing code:

	A
1	=now()
2	1995-1-1
3	=days@o(elapse@y(A2,1))
4	=days@o(A2)
5	>partname="bisque"
6	>nationname="CHINA"
7	=nation.select@1(N_NAME==nationname).N_NATIONKEY
8	=part.@m(pos@h(P_NAME,partname))
9	=supplier.@m(S_NATIONKEY==A7)
10	=file("partsupp_20.ctx").open().cursor@m(PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY;A8(PS_PARTKEY) && A9(PS_SUPPKEY)).fetch().keys@im(PS_PARTKEY,PS_SUPPKEY)
11	=file("lineitem_20.ctx").open().cursor@m(L_PARTKEY,L_SUPPKEY,L_QUANTITY;L_SHIPDATE>=A4 && L_SHIPDATE<A3)
12	=A11.join@i(L_PARTKEY:L_SUPPKEY,A10:PS_PARTKEY:PS_SUPPKEY,PS_AVAILQTY)
13	=A12.groups@u(L_PARTKEY,L_SUPPKEY;PS_AVAILQTY,sum(L_QUANTITY):quantity)
14	=A13.select(PS_AVAILQTY*2>quantity).run(A9(L_SUPPKEY)=null)
15	=supplier(A9.pselect@a(~==null))
16	=A15.new(S_NAME,S_ADDRESS).sort@0(S_NAME)
17	=interval@ms(A1,now())

A14 first select eligible records and then sets values of A9’s members corresponding to each record’s L_SUPPKEY as null. A15 first gets row numbers of all A9’s members having null values and then selects corresponding records from supplier table according to these row numbers.

Ⅳ Using enterprise edition’s column-wise computation

1. Original data

	A
1	=now()
2	1995-1-1
3	=elapse@y(A2,1)
4	>partname="bisque"
5	>nationname="CHINA"
6	=file("nation.btx").import@b().select@1(N_NAME==nationname).N_NATIONKEY
7	=file("part.ctx").open().cursor@mv(P_PARTKEY;pos@h(P_NAME,partname)).fetch().keys@im(P_PARTKEY)
8	=file("supplier.ctx").open().cursor@mv(S_SUPPKEY,S_NAME,S_ADDRESS;S_NATIONKEY==A6).fetch().keys@im(S_SUPPKEY)
9	=file("partsupp.ctx").open().cursor@mv(PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY;A7.find(PS_PARTKEY),A8.find(PS_SUPPKEY)).fetch().keys@im(PS_PARTKEY,PS_SUPPKEY)
10	=file("lineitem.ctx").open().cursor@mv(L_PARTKEY,L_SUPPKEY,L_QUANTITY;L_SHIPDATE>=A2 && L_SHIPDATE<A3)
11	=A10.join@i(L_PARTKEY:L_SUPPKEY,A9:PS_PARTKEY:PS_SUPPKEY,PS_AVAILQTY)
12	=A11.groups@u(L_PARTKEY,L_SUPPKEY;PS_AVAILQTY,sum(L_QUANTITY):quantity)
13	=A12.select@mv(PS_AVAILQTY*2>quantity).id(L_SUPPKEY)
14	=A8.join@im(S_SUPPKEY,A13:~)
15	=A14.new@m(S_NAME,S_ADDRESS).sort@o(S_NAME)
16	=interval@ms(A1,now())

2. Optimized data

First, we need to preload the dimension tables. Below is preloading code:

	A
1	>env(nation, file("nation_20.btx").import@bv())
2	>env(part, file("part_20.ctx").open().import@v())
3	>env(supplier, file("supplier_20.ctx").open().import@v())

Before performing the query, we need to first run the preloading code to load the small dimension tables into memory.

Computing code:

	A
1	=now()
2	1995-1-1
3	=days@o(elapse@y(A2,1))
4	=days@o(A2)
5	>partname="bisque"
6	>nationname="CHINA"
7	=nation.select@1(N_NAME==nationname).N_NATIONKEY
8	=part.(pos@h(P_NAME,partname))
9	=supplier.(S_NATIONKEY==A7)
10	=file("partsupp_20.ctx").open().cursor@mv(PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY;A8(PS_PARTKEY) && A9(PS_SUPPKEY)).fetch().keys@im(PS_PARTKEY,PS_SUPPKEY)
11	=file("lineitem_20.ctx").open().cursor@mv(L_PARTKEY,L_SUPPKEY,L_QUANTITY;L_SHIPDATE>=A4 && L_SHIPDATE<A3)
12	=A11.join@i(L_PARTKEY:L_SUPPKEY,A10:PS_PARTKEY:PS_SUPPKEY,PS_AVAILQTY)
13	=A12.groups@u(L_PARTKEY,L_SUPPKEY;PS_AVAILQTY,sum(L_QUANTITY):quantity)
14	=A13.select@mv(PS_AVAILQTY*2>quantity).run@m(A9(L_SUPPKEY)=null)
15	=supplier(A9.pselect@a(~==null))
16	=A15.new(S_NAME,S_ADDRESS).sort@0(S_NAME)
17	=interval@ms(A1,now())

Ⅴ Test result

Unit: Second

	Regular	Column-wise
Before optimization	10.6	5.9
After optimization	8.8	3.9

SPL Official Website 👉 https://www.esproc.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.esproc.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/sxd59A8F2W

Youtube 👉 https://www.youtube.com/@esProc_SPL

esProc

Jerry • 481 View • 3 Years ago