SPL Lightweight Multisource Mixed Computation Practices #1: Running SQL in RDBs

 

RDB is SPL’s native data source. SPL interacts with the database through JDBC to dynamically generate or concatenate a SQL statement or pass parameters to the statement. It can deal with all scenarios that currently need to use a high-level language, such as Java, together with SQL.

Import data from MySQL.

Set up database connection

..

Connection string: jdbc:mysql://127.0.0.1:3306/bytedba?useSSL=false&useCursorFetch=true

Computing use case

Computing goal

Find the total amount under each type of order status in the year 2024.

Set up script parameters

Set up a year parameter for the script to use in data filtering.

..

Write the script


A

1

=connect("dba")

2

=A1.query("select order_status,sum(total_amount) tamt,count(1) cnt from orders where year(order_date)=? group by order_status",d_year)

3

>A1.close()

A1: Connect to the database.

A2: Execute the SQL query, where d_year is the script parameter.

A3: Close database connection.

Save the preceding script as 1.1DB_SQL.splx and put it in the search path (Specified by Search path).

..

Execute the script

The SPL script can be executed (edit & debug/desktop analytics) within IDE, or be integrated into a Java application and invoked.

IDE

To execute the SPL script within IDE, just press Ctrl+F9, or click the Execute button on the toolbar.

..

..

Call script from Java

Find jars related to esProc JDBC in [esProc installation directory]\esProc\lib. They are esproc-bin-xxxx.jar and icu4j_60.3.jar.

Deploy the two jars in the classpath of Java development environment, and, if needed, put the database driver package in the Java application. Then find esProc configuration file raqsoftConfig.xml in [esProc installation directory]\esProc\config and also deploy it in the classpath of Java development environment.

Call script from Java:

public class callSplx {
    public static void main(String[] args) {
        String driver = "com.esproc.jdbc.InternalDriver";
        String url = "jdbc:esproc:local://";
        try {
            Class.forName(driver);
            Connection conn = DriverManager.getConnection(url);
            PreparedStatement st =conn.prepareCall("call 1.1DB_SQL(?)");
            st.setObject(1, 2024);
            st.execute();
            ResultSet rs = st.getResultSet();
            System.out.println("order_status\ttamt\t\tcnt");
            System.out.println("----------------------------------------------");
            while(rs.next()) {
                String order_status = rs.getString("order_status");
                String tamt = rs.getString("tamt");
                String cnt = rs.getString("cnt");
                System.out.printf("%-15s%-15s%-15s%n",order_status,tamt,cnt);
            }
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
}

In the above code, call 1.1DB_SQL(?) calls the previously saved script 1.1DB_SQL.splx.

..

Mixed use of SQL and SPL

Some complex computations are difficult to write in SQL. In those cases, SPL syntax becomes useful for further manipulating the data. Take the preceding computing task as an example, let’s perform data grouping & aggregation in SPL after retrieval.


A

1

=connect("dba")

2

=A1.query@x("select order_status,total_amount from orders where year(order_date)=?",d_year)

3

=A2.groups(order_status;sum(total_amount):tamt,count(1):cnt)

A2: Execute SQL statement to get detail data meeting the criterion. @x option enables to close the database connection when the query finishes execution, and the use of db.close() to explicitly close the connection becomes unnecessary.

A3: Perform grouping & aggregation in SPL.

A2’s SQL query returns filtered data. In practical applications, we should perform data transfers as fewer times as possible so that the I/O bottleneck can be avoided. Because of this, when using SQL to achieve a data-intensive computing goal, it would be best to first execute the basic operations (such as filtering and grouping) to reduce the total data volume before performing data querying.

After you master the basic use of SPL and how to use SPL to query data from RDBs, it is easy to access data from the other sources.