SPL computing performance test series: funnel statistics
I. Test task
Funnel analysis is a common statistical requirement in e-commerce business. When users shop with smart devices, the system will establish a connection, forming a session. Each session contains many action events, such as visiting a website, browsing a product page, placing an order. The action events of each user occur in a certain order; the later the event occurs, the fewer the number of users involved in the event, just like a funnel. Funnel conversion analysis first requires counting the distinct number of users for each step’s action event. Based on this, further calculations, such as conversion rates, are performed.
Here below are the simplified data structures of an e-commerce system:
sessions table
| Column name | Data type | Notes | 
|---|---|---|
| id | long | session number, primary key | 
| userid | long | User number | 
| devicetype | int | Device type | 
events table:
| Column name | Data type | Notes | 
|---|---|---|
| id | long | event number, primary key | 
| sessionid | long | session number | 
| userid | long | User number | 
| eventtime | datetime | Event time (accurate to milliseconds) | 
| eventtype | int | Event type | 
| eventmessage | string | Event message (200 characters) | 
eventtype table:
| Column name | Data type | Notes | 
|---|---|---|
| id | int | Sequence number, primary key | 
| name | string | Name | 
Devicetype table
| Column name | Data type | Notes | 
|---|---|---|
| id | int | Sequence number, primary key | 
| name | string | Name | 
Description of inter-table relationship:
The sessions table (s for short) and the events table (e for short) are in one-to-many relationship, associated through the userid and id of s and the userid and sessionid of e.
Each userid in s corresponds to unique id, and each session corresponds to unique event id.
The e and the eventtype table (et for short) are in many-to-one relationship, associated through the eventtype of e and the id of et.
The s and the devicetype table (dt for short) are in many-to-one relationship, associated through the devicetype of s and the id of dt.
The input parameter of the test is a specified date like 2022-07-21. This test is to calculate the three-step and seven-step funnel conversion rates for events happened within 30 days prior to the specified date.
Let’s take a three-step funnel analysis as an example, and the given time period is from 2022-06-21 00:00 to 2022-07-21 00:00, calculate:
the number of distinct users who performed the visit-type event, represented by step1_count;
the number of distinct users who performed the visit-type and view-type events in turn in a day, represented by step2_count;
the number of distinct users who performed the visit-type, view-type and confirm-type events in turn in a day, represented by step3_count;
Then, based on the calculation results, group and aggregate the numbers of distinct users by the devicetype of sessions table to which these events correspond, and finally divide step3_count by step1_count to calculate the conversion rate of three-step funnel.
The calculation result of three-step funnel is roughly as follows:

The computing process of seven-step funnel is similar to that of three-step funnel, except that there are seven types of events, namely: visit, view, detail, login, cart, confirm, and pay. The calculation result of seven-step funnel is roughly as follows:

This test is also categorized into two types (A and B) based on whether to include a string in the filter condition. Based on type A test, each step of type B test adds a filter condition that contains the substring. For example, in the first step of the funnel calculation, one more condition needs to be added to events: eventmessage contains the substring “check_point”. In the filter conditions of each step of funnel calculation, the substrings of the eventmessage field are different.
II. Technologies to be compared
This test used SPL Enterprise Edition (May 2023 version) and, in the same environment, used StarRocks and Oracle to calculate funnel conversion rates, comparing calculation performance.
1. Starrocks, claimed to be the fastest SQL-based database. Starrocks 3.0 and 2.5.3 are to be tested.
2. Oracle, a widely used database, often serves as a benchmark in database performance test. Oracle 19c is to be tested.
Since Oracle is not a professional OLAP database, its performance metrics are used as a reference point.
III. Test environment
One physical server with the following configuration:
2 x Intel3014 CPUs, main frequency 1.7G, 12 cores in total
64G memory
SSD (Solid State Drive)
To test the computing ability of these products on external storage as well as their sensitivity to memory, we use virtual machines to limit memory capacity, and design several test environments with different memory capacities, each with an 8-core CPU.
VM1: 32GB RAM; VM2: 16GB RAM; VM3: 8GB RAM; VM4: 6GB RAM; VM5: 4GB RAM
For StarRocks, at least two nodes, BE and FE, need to be installed. The BE that undertakes computing task is installed on the VM, and the FE that undertakes management task is installed on the physical machine to avoid affecting test result.
For SPL and Oracle, they only need to be installed on VM.
We tested SPL and StarRocks on all virtual machines capable of producing results. However, Oracle was tested only on VM1, as its results are primarily for reference.
IV. Test data
The date range for all data is from 2022-06-28 to 2022-07-28.
sessions: 3.5 million rows
events: 280 million rows
eventtype: 678 rows
devicetype: 8 rows
Click here to download the test data.
After downloading, unzip it to a directory, configure this directory as the main directory in esProc, and execute dataInit.splx to generate text files.
The generated test data are stored in four tab-separated text files (without field name):
sessions.txt: the data are sorted by userid and id (in ascending order, the same below).
events.txt: the data are sorted by userid, sessionid, eventtime and id. For the filter conditions on substrings (such as “check_point”) in type B test, the judgment results for events table are all “true”.
eventtype.txt: the data are sorted by id.
devicetype.txt: the data are sorted by id.
The order of the fields is the same as that of the fields in the above table structure.
For example, the contents of devicetype.txt are:
1 phone
2 other
3 unkown
4 laptop
5 pad
6 PC
7 TV
8 spider
V. Test SQL
We tested Oracle on VM1 with type A three-step and seven-step funnel calculations.
We tested StarRocks on all VMs capable of producing results with three-stpe and seven-step funnel calculations of both type A and B tests.
During the test, we found that an error is reported when StarRocks calculates step_count for two or more steps:

Therefore, during testing, only one step’s step_count could be calculated at a time.
For example, if any two or three out of step1_count, step2_count, and step3_count is calculated in the 3-step funnel, this error will definitely occur. So only one of the three can be calculated during the test.
This error occurs in both the latest version 3.0 and the stable version 2.5.3.
SQL statement for Oracle (type A test, three-step funnel):
WITH e1 AS (
  SELECT
    userid,
    eventtime AS step1_time,
    MIN(sessionid) AS sessionid,
    1 AS step1
  FROM events e1
  JOIN eventtype ON eventtype.id = e1.eventtype
  WHERE eventtime>= to_date('2022-07-21','yyyy-mm-dd')-30
    AND eventtime< to_date('2022-07-21','yyyy-mm-dd') 
    AND (eventtype.name = 'visit')
  GROUP BY userid,eventtime
), e2 AS (
  SELECT
    e2.userid,
    MIN(e2.sessionid) AS sessionid,
    1 AS step2,
    MIN(eventtime) AS step2_time,
    MIN(e1.step1_time) AS step1_time
  FROM events e2
  JOIN e1 ON e1.sessionid = e2.sessionid  AND eventtime > step1_time
  JOIN eventtype ON eventtype.id = e2.eventtype
  WHERE eventtime < step1_time +1
    AND (eventtype.name = 'view')
  GROUP BY  e2.userid
), e3 AS (
  SELECT
    e3.userid,
    MIN(e3.sessionid) AS sessionid,
    1 AS step3,
    MIN(eventtime) AS step3_time,
    MIN(e2.step1_time) AS step1_time
  FROM events e3
  JOIN e2 ON e2.sessionid = e3.sessionid  AND eventtime > step2_time
  JOIN eventtype ON eventtype.id = e3.eventtype
  WHERE eventtime < step1_time+1
    AND (eventtype.name = 'detail')
  GROUP BY  e3.userid
)
SELECT
    dt.name AS devicetype,
  COUNT(DISTINCT CASE WHEN funnel_conversions.step1 IS NOT NULL THEN funnel_conversions.step1_userid  ELSE NULL END) AS step1_count,
  COUNT(DISTINCT CASE WHEN funnel_conversions.step2 IS NOT NULL THEN funnel_conversions.step2_userid  ELSE NULL END) AS step2_count,
  COUNT(DISTINCT CASE WHEN funnel_conversions.step3 IS NOT NULL THEN funnel_conversions.step3_userid  ELSE NULL END) AS step3_count 
  COUNT(DISTINCT CASE WHEN funnel_conversions.step3 IS NOT NULL THEN funnel_conversions.step3_userid  ELSE NULL END) 
    / COUNT(DISTINCT CASE WHEN funnel_conversions.step1 IS NOT NULL THEN funnel_conversions.step1_userid  ELSE NULL END) AS step3_rate
FROM (
  SELECT
    e1.step1_time AS step1_time,
    e1.userid AS userid,
    e1.userid AS step1_userid,
    e2.userid AS step2_userid,  
    e3.userid AS step3_userid,
    e1.sessionid AS step1_sessionid,
    step1, step2, step3
  FROM e1
  LEFT JOIN e2 ON e1.userid=e2.userid
  LEFT JOIN e3 ON e2.userid=e3.userid
  ) funnel_conversions
LEFT JOIN sessions s ON funnel_conversions.step1_sessionid = s.id 
left join devicetype dt on s.devicetype=dt.id 
GROUP BY dt.name
Where:
1. Subquery e1: find out the visit-type events occurred from to_date(‘2022-07-21’,‘yyyy-mm-dd’) to to_date(‘2022-07-21’,‘yyyy-mm-dd’)-30; group the results by userid and eventtime; take the min sessionid in each group to serve as e1’s sessionid, and take eventtime as e1’s step1_time. The constant 1 is the step1 field of the result set e2.
2. Subquery e2: inner join of events table and e1. The conditions to join include that the sessionid of e1 is equal to the sessionid of events table, and the eventtime is greater than e1.step1_time; filter the join results. The conditions to filter include that the eventtime is less than e1.step1_time + 1 day, and the eventtype is view; group by the userid of events table, and take the min sessionid, min eventtime, and min step1_time, to serve as sessionid, step2_time, and step1_time of the result set e2 respectively; the constant 1 is the step2 field of the result set e2.
3. Subquery e3 is similar to e2, except for the eventtype.
4. Subquery funnel_conversions: left join of e1 and e2, e3. The condition to join is that the userids are the same. In the join result, there are userids of e1 to e3, and there are step1 to step3.
5. The outermost layer query: join on the id of sessions table and the sessionid of funnel_conversions, and join on the id of devicetype table and the devicetype of sessions table; group by the name of devicetype table; calculate the number of distinct users of each device type and each step; divide the result of the third step by the result of the first step to get the overall conversion rate.
SQL statement for StarRocks (type B test, seven-step funnel):
WITH e1 AS (
  SELECT
    userid,
    eventtime AS step1_time,
    MIN(sessionid) AS sessionid,
    1 AS step1
  FROM events e1
  JOIN eventtype ON eventtype.id = e1.eventtype
  WHERE  eventtime>= DATE_ADD(str_to_date('2022-07-21','%Y-%m-%d') ,INTERVAL -30 day) AND eventtime< str_to_date('2022-07-21','%Y-%m-%d') 
    AND (eventtype.name = 'visit')
	and eventmessage like '%check%'
  GROUP BY userid,eventtime
), e2 AS (
  SELECT
    e2.userid,
    MIN(e2.sessionid) AS sessionid,
    1 AS step2,
    MIN(eventtime) AS step2_time,
    MIN(e1.step1_time) AS step1_time
  FROM events e2
  JOIN e1 ON e1.sessionid = e2.sessionid  AND eventtime > step1_time
  JOIN eventtype ON eventtype.id = e2.eventtype
  WHERE eventtime < DATE_ADD(step1_time ,INTERVAL +1 day)
    AND (eventtype.name = 'view')
	and eventmessage like '%point%'
  GROUP BY  e2.userid
), e3 AS (
  SELECT
    e3.userid,
    MIN(e3.sessionid) AS sessionid,
    1 AS step3,
    MIN(eventtime) AS step3_time,
    MIN(e2.step1_time) AS step1_time
  FROM events e3
  JOIN e2 ON e2.sessionid = e3.sessionid  AND eventtime > step2_time
  JOIN eventtype ON eventtype.id = e3.eventtype
  WHERE eventtime < DATE_ADD(step1_time ,INTERVAL +1 day)
    AND (eventtype.name = 'detail')
	and eventmessage like '%check_point%'
  GROUP BY  e3.userid
), e4 AS (
  SELECT
    e4.userid,
    MIN(e4.sessionid) AS sessionid,
    1 AS step4,
    MIN(eventtime) AS step4_time,
    MIN(e3.step1_time) AS step1_time
  FROM events e4
  JOIN e3 ON e3.sessionid = e4.sessionid  AND eventtime > step3_time
  JOIN eventtype ON eventtype.id = e4.eventtype
  WHERE eventtime < DATE_ADD(step1_time ,INTERVAL +1 day)
    AND (eventtype.name = 'login')
	and eventmessage like '%check_p%'
  GROUP BY  e4.userid
), e5 AS (
  SELECT
    e5.userid,
    MIN(e5.sessionid) AS sessionid,
    1 AS step5,
    MIN(eventtime) AS step5_time,
    MIN(e4.step1_time) AS step1_time
  FROM events e5
  JOIN e4 ON e4.sessionid = e5.sessionid  AND eventtime > step4_time
  JOIN eventtype ON eventtype.id = e5.eventtype
  WHERE eventtime < DATE_ADD(step1_time ,INTERVAL +1 day)
    AND (eventtype.name = 'cart')
	and eventmessage like '%k_point%'
  GROUP BY  e5.userid
), e6 AS (
  SELECT
    e6.userid,
    MIN(e6.sessionid) AS sessionid,
    1 AS step6,
    MIN(eventtime) AS step6_time,
    MIN(e5.step1_time) AS step1_time
  FROM events e6
  JOIN e5 ON e5.sessionid = e6.sessionid  AND eventtime > step5_time
  JOIN eventtype ON eventtype.id = e6.eventtype
  WHERE eventtime < DATE_ADD(step1_time ,INTERVAL +1 day)
    AND (eventtype.name = 'confirm')
	and eventmessage like '%ck_point%'
  GROUP BY  e6.userid
), e7 AS (
  SELECT
    e7.userid,
    MIN(e7.sessionid) AS sessionid,
    1 AS step7,
    MIN(eventtime) AS step7_time,
    MIN(e6.step1_time) AS step1_time
  FROM events e7
  JOIN e6 ON e6.sessionid = e7.sessionid  AND eventtime > step6_time
  JOIN eventtype ON eventtype.id = e7.eventtype
  WHERE eventtime < DATE_ADD(step1_time ,INTERVAL +1 day)
    AND (eventtype.name = 'pay')
	and eventmessage like '%check_poi%'
  GROUP BY  e7.userid
)
SELECT
    dt.name AS devicetype,
  COUNT(DISTINCT funnel_conversions.step7_userid) AS step7_count
FROM (
  SELECT
    e1.step1_time AS step1_time,
    e1.userid AS userid,
    e1.userid AS step1_userid,
    e2.userid AS step2_userid,  
    e3.userid AS step3_userid,
    e4.userid AS step4_userid,
    e5.userid AS step5_userid,
    e6.userid AS step6_userid,
    e7.userid AS step7_userid,
    e1.sessionid AS step1_sessionid,
    step1,
    step2,
    step3,
	step4,
	step5,
	step6,
	step7
  FROM e1
  LEFT JOIN e2 ON e1.userid=e2.userid
  LEFT JOIN e3 ON e2.userid=e3.userid
  LEFT JOIN e4 ON e3.userid=e4.userid
  LEFT JOIN e5 ON e4.userid=e5.userid
  LEFT JOIN e6 ON e5.userid=e6.userid
  LEFT JOIN e7 ON e6.userid=e7.userid
  ) funnel_conversions
LEFT JOIN sessions s ON funnel_conversions.step1_sessionid = s.id 
left join devicetype dt on s.devicetype=dt.id 
GROUP BY dt.name
The SQL code for the funnel calculations in StarRocks is roughly the same as that of Oracle. However, due to the previously described error in StarRocks, only one step is calculated at the outermost level of the query: COUNT(DISTINCT funnel_conversions.step7_userid) AS step7_count.
When conducting type A tests, the only change needed is to remove the eventmessage filter from each step.
VI. Test SPL
SPL code:
1. Data conversion code
Save the devicetype and eventtype tables as bin files. The SPL code is quite simple:
| A | |
|---|---|
| 1 | =file(“devicetype.txt”).import().new(_1:id,_2:name) | 
| 2 | =file(“devicetype.btx”).export@b(A1) | 
| 3 | =file(“eventtype.txt”).import().new(_1:id,_2:name) | 
| 4 | =file(“eventtype.btx”).export@b(A3) | 
For sessions table, save the data as columnar composite table and segment by the first field.
SPL code:
| A | |
|---|---|
| 1 | =file(“sessions.txt”).cursor().new(_1:id,_2:userid,_3:sessiontime,_4:devicetype) | 
| 2 | =file(“sessions.ctx”).create@p(#userid,#id,sessiontime,devicetype) | 
| 3 | =A4.append@i(A3) | 
For events table, convert the eventtime to long, save the data as columnar composite table and segment by the first field.
SPL code:
| A | |
|---|---|
| 1 | =file(“events.txt”).cursor().new(_1:id,_2:userid,_3:sessionid,_4:eventtype,_5:eventtime) | 
| 2 | =file(“eventtype.btx”).import@b().keys@i(id) | 
| 3 | =A1.new(userid,sessionid,long(eventtime):eventtime,id,eventtype) | 
| 4 | =file(“events.ctx”).create@p(#userid,#sessionid,#eventtime,#id,eventtype) | 
| 5 | =A4.append@i(A3) | 
2. Funnel calculation code
Let’s take seven-step funnel calculation of type B test as an example, SPL code:
| A | B | |
|---|---|---|
| 1 | [“visit”,“view”,“detail”,“login”,“cart”,“confirm”,“pay”] | |
| 2 | [“check”,“point”,“check_point”,“check_p”,“k_point”,“ck_point”,“check_poi”] | |
| 3 | =file(“eventtype.btx”).import@b() | =file(“devicetype.btx”).import@b() | 
| 4 | =A3.(A1.pos(name)) | =A1.len() | 
| 5 | =long(elapse(arg_date,-30)) | =long(arg_date) | 
| 6 | =long(arg_date+1) | |
| 7 | =file(“events.ctx”).open() | |
| 8 | =A7.cursor@mv(userid,sessionid,eventtime,eventtype,eventmessage;eventtime>=A5 && eventtime<A6 && pos(eventmessage,A2(eventtype)),eventtype:A4:#) | |
| 9 | =file(“sessions.ctx”).open().cursor@v(userid,id,devicetype;;A8) | |
| 10 | =A8.joinx@m(userid:sessionid,A9:userid:id,devicetype) | |
| 11 | =A10.group(userid) | |
| 12 | =A11.new(~.align@a(B4,eventtype):e,e(1).select(eventtime<B5).group@u1(eventtime):e1,e(2).group@o(sessionid):e2) | |
| 13 | =A12.derive(join@m(e1:sub_e1,sessionid;e2:sub_e2,sessionid).derive@o(sub_e2.select(eventtime>sub_e1.eventtime && eventtime<sub_e1.eventtime+86400000).min(eventtime):sub_e2_min_time,sub_e1.sessionid:sessionid).select(sub_e2_min_time):e2_1 ) | |
| 14 | =A13.new(e1.id(devicetype):e1_id_devicetypeno,ifn(e2_1.min(sub_e1.eventtime),1):e1_min_time,e2_1.min(sub_e2_min_time):e2_min_time,${to(3,B4).(“e(”/~/“).select(sessionid==e”/(~-1)/“_1.min(sessionid) && eventtime>e”/(~-1)/“_min_time && eventtime<e1_min_time+86400000):e”/~/“_1,e”/~/“_1.min(eventtime):e”/~/“_min_time”).concat@c()}) | |
| 15 | =A14.news(e1_id_devicetypeno;~:devicetype ,${B4.(“e”/~/“_min_time”).concat@c()}) | |
| 16 | =A15.groups(devicetype;${B4.(“count(e”/~/“_min_time):STEP”/~/“_COUNT”).concat@c()}) | |
| 17 | =A16.new(B3(devicetype).name:DEVICETYPE,${B4.(“STEP”/~/“_COUNT”).concat@c()},STEP${B4}_COUNT/STEP1_COUNT:STEP${B4}_RATE) | |
The given parameter arg_date is a date, such as 2022-07-21.
A1 is the event type eventtype, and A2 is the string used to find the eventmessage for each corresponding type. For readability, seven events are written here. However, in practice, any two or more events can be used, passed dynamically as parameters. The code below can compute funnel conversion rates for any two or more steps without modification.
A3, B3: Load the eventtype table and the devicetype table.
A4: In the eventtype table, sequentially set the event types used in calculations to integers 1, 2, 3, …, and set all other types to null.
A5, B5, A6: Calculate the date 30 days before and 1 day after the parameter arg_date, and convert the three dates to long.
A7: Open the composite table object events.
A8: From the events table, filter for the required event types that satisfy the eventmessage condition, and retrieve one more day of data, and establish multiple columnar cursors.
A9: Create a sessions cursor, segmented by A8.
A10-A11: Merge the event and session columnar cursors in order, and group by userid.
A12: For each userid group, align and group the eventtype by the order of 1, 2, 3, … ,7, making them correspond to every event types respectively.
A13: Merge e1 and e2 in order by sessionid, calculate the min eventtime that satisfies condition from the merged result, and filter for records where the minimum value is not null.
A14: Perform distinct calculation on devicetype in e1, and calculate the min eventtime and the min sessionid from the merged result of e1 and e2. Starting from e3, filter for the records that meet sessionid and time conditions, i.e., the events in e3 to e7 that meet conditions. Macros are used here; the actual executed statement is:
=A13.new(e1.id(devicetype):e1_id_devicetypeno,ifn(e2_1.min(sub_e1.eventtime),1):e1_min_time,e2_1.min(sub_e2_min_time):e2_min_time, e(3).select(sessionid==e2_1.min(sessionid) && eventtime>e2_min_time && … && eventtime<e1_min_time+86400000):e7_1,e7_1.min(eventtime):e7_min_time)
A15: Calculate the distinct devicetype based on the cursor in A14; calculate the min time in e2, e3…e7, and merge into the original cursor. The actual executed statement is:
=A14.news(e1_id_devicetypeno;~:devicetype,e1_min_time,e2_min_time,e3_min_time,e4_min_time,e5_min_time,e6_min_time,e7_min_time)
A16: Group and aggregate the small result sets in A15; the grouping field is devicetype, calculate the count value of each step in each group. The actual executed statement is:
=A15.groups(devicetype;count(e1_min_time):STEP1_COUNT,count(e2_min_time):STEP2_COUNT,…,count(e7_min_time):STEP7_COUNT
A17: Convert the devicetype in the table sequence in A16 from sequence number to name. The actual executed statement is:
=A16.new(B3(devicetype).name:DEVICETYPE,STEP1_COUNT,STEP2_COUNT,STEP3_COUNT,STEP4_COUNT,STEP5_COUNT,STEP6_COUNT,STEP7_COUNT,STEP7_COUNT/STEP1_COUNT:STEP7_RATE)
When conducting type A test, we only need to remove the filter condition of eventmessage in A8.
VII. Test results
1. Type A test (without eventmessage condition):
| 3-step funnel | VM1:32G | VM2:16G | VM3:8G | VM4:6G | VM5:4G | 
|---|---|---|---|---|---|
| SPL | 27 | 26 | 28 | 26 | 28 | 
| StarRocks | 53 | 51 | 62 | 64 | Out-of-memory (OOM) | 
| Oracle | 418 | - | - | - | - | 
| 7-step funnel | VM1:32G | VM2:16G | VM3:8G | VM4:6G | VM5:4G | 
|---|---|---|---|---|---|
| SPL | 30 | 32 | 28 | 33 | 34 | 
| StarRocks | 122 | 114 | 152 | 155 | OOM | 
| Oracle | 1827 | - | - | - | - | 
2. Type B test (with eventmessage condition):
| 3-step funnel | VM1:32G | VM2:16G | VM3:8G | VM4:6G | 
|---|---|---|---|---|
| SPL | 35 | 39 | 39 | 52 | 
| StarRocks | 210 | 210 | 192 | OOM | 
| 7-step funnel | VM1:32G | VM2:16G | VM3:8G | VM4:6G | 
|---|---|---|---|---|
| SPL | 46 | 61 | 52 | 92 | 
| StarRocks | 510 | 520 | OOM | OOM | 
3. Notes
i). The values in the tables are the calculation time, measured in seconds.
ii). Due to long computing time, Oracle is tested only on VM1 in type A test.
iii). Since an error will occur to StarRocks when calculating two or more steps, only one step can be calculated.
VIII. Comments on test results
1. For funnel calculation, SPL performs the best, StarRocks is worse, and Oracle is the worst. SPL is 2-5 times faster than StarRocks and 15-60 times faster than Oracle in computing speed.
2. In type A test, when the number of funnel calculation steps increases from 3 to 7, the computing speed of StarRocks reduces by nearly 2 times, while that of SPL changes little. The reason is that between SQL steps, the JOIN calculation needs to be performed on sessionid or userid, and the more steps, the greater number of JOIN calculations, resulting in a corresponding decrease in performance. In contrast, SPL combines the same userid and sessionid together for calculation, avoiding repeated JOIN operations.
3. When other test conditions are the same, the computing speed of type B test is generally slower than that of type A test. In comparison, however, SPL changes less than StarRocks in computing speed, for the reason that although StarRocks’ vector-based calculation can make use of CPU characteristics to speed up, this kind of calculation works only for simple data types like int/long and doesn’t work for complex data like string, and will result in a significant performance decrease when involving string computation.
4. StarRocks limits the upper limit of memory (80% capacity of physical memory). Once the memory capacity of VM decreases, OOM will occur, resulting in a failure to obtain result. Because the JOIN operation of SQL is executed in the memory, it will consume more memory space. Especially for calculation involving string in type B test, the memory will bear a greater load, and as a result, OOM is more likely to occur. By contrast, SPL avoids repeated JOIN, making it possible to obtain calculation result with less memory space.
5. Conclusions:
SPL can utilize ordered storage, making it an ideal tool to perform user behavior analysis calculations such as funnel statistics: the total data amount is large, yet the data amount of each userid is not. Storing the data in order by userid and sessionid can not only achieve extreme performance, but also make code more concise.
Funnel statistics is a quite complicated calculation. Although this calculation can be written in SQL, repeated join has to be employed. If this calculation is executed in the old database Oracle, it can get correct results, but the performance is poor. As a database with a short history, when StarRocks faces with such complex SQL code, errors would be reported directly. When only one calculation step is involved, StarRocks runs normally, but its performance is still far behind SPL; when multiple steps are involved, its performance will be poorer.
SPL Official Website 👉 https://www.esproc.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.esproc.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/sxd59A8F2W
Youtube 👉 https://www.youtube.com/@esProc_SPL
 
            
         
Chinese version