SPL Lightweight Multisource Mixed Computation Practice #7: SQL migration

 

The application may work based on different databases. Databases use similar SQL syntax, but they offer different implementations. As a result, SQL statements need to be modified, only manually. This involves a huge amount of work and becomes error-prone.

Completely automated modifications are nearly impossible. After all, different databases probably offer different functionalities.

But an examination shows that most obstacles result from the different syntax of SQL functions.

The date and string-related functions, in particular, do not have an industry standard. Different database products have different implementations. To transform string "2020-02-05" to date data, for example, they offer different syntax.

ORACLE

select TO_DATE('2020-02-05', 'YYYY-MM-DD') from USER

SQL Server

select CONVERT(varchar(100), '2020-02-05', 23) from USER

MySQL

select DATE_FORMAT('2020-02-05','%Y-%m-%d') from USER

To switch the application between different databases, you need to modify the SQL statements.

SPL solution

To address this issue, SPL offers a SQL translation functionality. The functionality converts a certain type of standard SQL to the counterpart of another database, achieving seamless code migration during switch between different databases.

..

In sql.sqltranslate(dbtype) function, parameter sql at the beginning represents the SQL statement to be translated and parameter dbtype is the database type. Only functions already defined in SPL’s simple SQL will be translated; the undefined will not. The list of defined functions and database types can be found under sqltranslate() function.

Using sqtranslate() in IDE

Let’s first try to perform the SQL translation in IDE by transforming SELECT EID, NAME, BIRTHDAY, ADDDAYS(BIRTHDAY,10) DAY10 FROM EMP to the SQL dialects of different databases.

..

The ADDDAYS() function is translated to dialects of different databases, migrating SQL between different databases.

Look at more examples.

Add 10 months from a given date

SELECT EID, NAME, BIRTHDAY, ADDMONTHS(BIRTHDAY,10) DAY10 FROM EMP

Translate the query to SQL dialects of different databases using sqltranslate() function:

ORACLE:

SELECT EID, NAME, BIRTHDAY, BIRTHDAY+NUMTOYMINTERVAL(10,'MONTH') DAY10 FROM EMP

SQLSVR:

SELECT EID, NAME, BIRTHDAY, DATEADD(MM,10,BIRTHDAY) DAY10 FROM EMP

DB2:

SELECT EID, NAME, BIRTHDAY, BIRTHDAY+10 MONTHS DAY10 FROM EMP

MYSQL:

SELECT EID, NAME, BIRTHDAY, BIRTHDAY+INTERVAL 10 MONTH DAY10 FROM EMP

POSTGRES:

SELECT EID, NAME, BIRTHDAY, BIRTHDAY+interval '10 months' DAY10 FROM EMP

TERADATA:

SELECT EID, NAME, BIRTHDAY, ADD_MONTHS(BIRTHDAY, 10) DAY10 FROM EMP

The implementations of ADDMONTHS() function in different databases vary greatly. SQLServer has DATEADD function, both MySQL and PG add the number of months directly, and Oracle use a combining method involving a ready-to-use function and the direct addition.

Find the quarter a given date belongs to

SELECT EID,AREA,QUARTER(ORDERDATE) QUA, AMOUNT FROM ORDERS

is converted to:

ORACLE:

SELECT EID,AREA,FLOOR((EXTRACT(MONTH FROM ORDERDATE)+2)/3) QUA, AMOUNT FROM ORDERS

SQLSVR: SELECT EID,AREA,DATEPART(QQ,ORDERDATE) QUA, AMOUNT FROM ORDERS

POSTGRES: SELECT EID,AREA,EXTRACT(QUARTER FROM ORDERDATE) QUA, AMOUNT FROM ORDERS

TERADATA:

SELECT EID,AREA,TD_QUARTER_OF_YEAR(ORDERDATE) QUA, AMOUNT FROM ORDERS

Though databases all have their own implementations for the quarter function, there are big differences in function names and parameter definitions.

Type conversion

SELECT EID, NAME, DATETOCHAR(BIRTHDAY) FROM EMP

is converted to:

ORACLE:

SELECT EID, NAME, TO_CHAR(BIRTHDAY,'YYYY-MM-DD HH:MI:SS') FROM EMP

SQLSVR:

SELECT EID, NAME, CONVERT(CHAR,BIRTHDAY,120) FROM EMP

DB2:

SELECT EID, NAME, TO_CHAR(BIRTHDAY,'YYYY-MM-DD HH:MI:SS') FROM EMP

MYSQL:

SELECT EID, NAME, DATE_FORMAT(BIRTHDAY, '%Y-%m-%d %H:%i:%S) FROM EMP

POSTGRES:

SELECT EID, NAME, TO_CHAR(BIRTHDAY,'YYYY-MM-DD HH:MI:SS') FROM EMP

TERADATA:

SELECT EID, NAME, TO_CHAR(BIRTHDAY,'YYYY-MM-DD HH:MI:SS') FROM EMP

Different databases have different names and formats for the type conversion function.

But all differences can be smoothed out through SPL’s sqltranslate() function.

Function definition and extension

The database types and function definitions SPL supports are located in dictionary file /com/scudata/dm/sql/function.xml in release package esproc-bin.jar.

<?xml version="1.0" encoding="utf-8"?>
<STANDARD>
  <FUNCTIONS type="FixParam">
    <FUNCTION name="ADDDAYS" paramcount="2" value="">
      <INFO dbtype="ORACLE" value="?1+NUMTODSINTERVAL(?2,'DAY')"/>
      <INFO dbtype="SQLSVR" value="DATEADD(DD,?2,?1)"/>
      <INFO dbtype="DB2" value="?1+?2 DAYS"/>
      <INFO dbtype="MYSQL" value="?1+INTERVAL ?2 DAY"/>
      <INFO dbtype="HSQL" value="DATEADD('dd', ?2, ?1)"/>
      <INFO dbtype="TERADATA" value="?1+CAST(?2 AS INTERVAL DAY)"/>
      <INFO dbtype="POSTGRES" value="?1+interval '?2 days'"/>
      <INFO dbtype="ESPROC" value="elapse(?1,?2)"/>
    </FUNCTION>
  </FUNCTIONS>
</STANDARD>

FUNCTIONS node represents a function group, where type is the function group type and FixParam represents a function group having a fixed number of parameters. FUNCTION node represents a simple SQL function, where name is the function name, paramcount is the number of parameters, and value specifies the default translation of the current function or means that no translation is needed when an empty string is placed. INFO node specifies a database, where dbtype is the database name or represents SPL simple SQL when its value is an empty string, and value is the counterpart of the function to be translated to the dialect of the current database. In the INFO node’s value item, ? or ?1 means the function’s first parameter value, ?2 is the second parameter value, and so on; when value is an empty string, use the value at its parent node FUNCTION.

If no function information corresponding to a specified database is defined at a INFO node under FUNCTION node, a function will not be translated during the sqltranslate() computation.

SPL defines a lot of functions in function.xml, but not all. In real-world businesses, you may encounter new functions, which can be added to the collection by yourself.

For example, to add a function for finding the number of days between two given dates, add a FUNCTION node to specify function name as DATEDIFF and then configure its counterparts in different databases at INFO node.

<FUNCTION name="DATEDIFF" paramcount="2" value="">
     <INFO dbtype="ORACLE" value="?1-?2"/>
     <INFO dbtype="SQLSVR" value="DATEDIFF(day,?1,?2)"/>
     <INFO dbtype="MYSQL" value="DATEDIFF(?1,?2)"/>
     <INFO dbtype="POSTGRES" value="?1-?2"/>
     <INFO dbtype="ESPROC" value="interval(?2,?1)"/>
   </FUNCTION>

Similarly, to add supports for the other databases, just add INFO nodes and configure the database information. For example, we want to add support for SQLite to which the function of finding date difference between two given dates will be translated:

<FUNCTION name="DATEDIFF" paramcount="2" value="">
     <INFO dbtype="ORACLE" value="?1-?2"/>
     <INFO dbtype="SQLSVR" value="DATEDIFF(day,?1,?2)"/>
     <INFO dbtype="MYSQL" value="DATEDIFF(?1,?2)"/>
     <INFO dbtype="POSTGRES" value="?1-?2"/>
     <INFO dbtype="ESPROC" value="interval(?2,?1)"/>
     <INFO dbtype="SQLite" value="JULIANDAY(?1) - JULIANDAY(?2)"/>
</FUNCTION>

When there are a dynamic number of parameters

In the preceding examples, the number of parameters in each function is fixed. But in certain functions and operations, such as string concatenation function and case when, the number of parameters cannot be predefined. Besides, there are other cases, such as getting the first non-null value in the multiple parameters.

SPL also offers solution to solving the dynamic parameter issue. Setting the value of type under FUNCTIONS node as AnyParam, which represents any number of parameters.

 <FUNCTIONS type="AnyParam">
    <FUNCTION name="coalesce">
    	<INFO dbtype="ESPROC" script='"ifn(" + ?.concat@c() +")"'/>
    </FUNCTION>
    <FUNCTION  name="concat">
    	<INFO dbtype="ESPROC" script=' "concat("+ ?.concat@c() +")" '/>
		<INFO dbtype="ORACLE" script=' ?.concat("||") '/>
		<INFO dbtype="DB2" script=' ?.concat("||")'/>
		<INFO dbtype="SQLSVR" script=' ?.concat("+")'/>
		<INFO dbtype="POSTGRES" script=' ?.concat("||")'/>
		<INFO dbtype="TERADATA" script=' ?.concat("||")'/>
		<INFO dbtype="HSQL" script=' ?.concat("||")'/>
    </FUNCTION>
  </FUNCTIONS>

You just need to set database type and the corresponding translation script (written in SPL syntax) under FUNCTION node. For example, to add Oracle support for translation of string function CONCAT, just configure the following INFO information under <FUNCTION name="concat">:

<INFO dbtype="ORACLE" script='"("+ ?.concat("||") +")" '/>

Later when you need to translate SQL statement SELECT EID,NAME,BIRTHDAY,concat(EID,"_",NAME) FROM EMP to the dialect of Oracle, the result would be:

SELECT EID,NAME,BIRTHDAY,(EID || "_" || NAME) FROM EMP

And to translate it to ESPROC database you get this:

SELECT EID,NAME,BIRTHDAY,concat(EID,"_",NAME) FROM EMP

..

As of now, we’ve explained how to use the sqltranslate() function, how to perform related settings, and how to add supports for new functions and databases, including the cases when the number of function parameters is dynamic.

Application

Perform SQL translation only

To use SPL’s SQL translation functionality in an application, the simplest way is to use sqltranslate() function to translate a SQL statement to the dialect of target database before executing it.

The SPL API for translating SQL statements is com.scudata.dm.sql.SQLUtil.translate, which can be directly used to achieve SQL translation.

tring sql = "select name, birthday, adddays(birthday,10) day10 from emp";
sql = com.scudata.dm.sql.SQLUtil.translate(sql, "MYSQL");

I need to make one point clear that the API is officially not recommended for performing the translation. Rather, it is SPL JDBC that is suggested. But writing several lines of code to connect to JDBC only for a string conversion action is really a hassle, so we directly use API.

In addition, we hope to make the SQL migration as transparent as possible – apart from the configuration file maintenance, there will be no need to rewrite and recompile the code after the first modification. This is why we use a configuration file to set the database type.

For example, we create a database type configuration file dbconfig.properties to set the database type as, say, MYSQL.

Below is the content of dbconfig.properties:

database.type=MYSQL
database.name=MYDATASOURCE

Then encapsulate a translation method and call SPL API to perform the SQL translation.

public static String translateSQL(String sql) {
    String dbType = null;
    try (InputStream input = SQLTranslator.class.getClassLoader().getResourceAsStream("dbconfig.properties")) {
        Properties prop = new Properties();
        if (input == null) {
            System.out.println("Sorry, unable to find dbconfig.properties");
            return null;
        }
        prop.load(input);
        dbType = prop.getProperty("database.type");
    } catch (Exception ex) {
        ex.printStackTrace();
    }
    return SQLUtil.translate(sql, dbType);
}

In the main program, pass in the to-be-translated SQL statement and call SQL translation method. The subsequent code, including parameter settings, SQL execution and getting result, is the same. Actually, only one statement (sql = translateSQL(sql)) is added to the main program.

public static void main(String[] args) {
  ……

  String sql = “SELECT name, birthday, adddays(birthday,10) day10 “
	+ “ FROM emp where dept=? and salary>?” ;
    sql = translateSQL(sql);
    
    pstmt.setString(1, "Sales");   
    pstmt.setDouble(2, 50000); 

    ……
}

More transparent translation and SQL execution

The preceding method involves an extra translation during the call. If a SQL statement needs to be executed in multiple locations, the original program will undergo a lot of changes. Plus, the use of officially not recommended interface may cause incompatibility later.

To solve the problems, we adopt a more transparent method – perform both SQL translation and SQL execution to get the result set in SPL.

SPL offers standard JDBC support. Different from the general database JDBC, SPL allows using URL to pass parameters in and specify jobVars to represent a task variable (only valid within the thread it works for) in the script. In this way, data source information can be passed to the script for each execution. For example:

url=" jdbc:esproc:local://? jobVars=dbType:MYSQL,dbName:mysqlds"

The setting means a MySQL database named mysqlds will be passed to the script.

The main program only needs to modify the original database connection code to connect to SPL JDBC and provide the corresponding parameters.

public static void main(String[] args) {
    Map<String, String> info = getDbInfo();
    String driver = "com.esproc.jdbc.InternalDriver";
    String url="jdbc:esproc:local://?jobVars=dbType:MySQL,dbName:mysqlds";

    try {
        System.out.println(url);
        Class.forName(driver);
        Connection conn = DriverManager.getConnection(url);
        String sql = "SELECT orderid, employeeid, adddays(orderdate,10) day10,amount "
                + "FROM orders WHERE employeeid > ? AND amount > ?";
        PreparedStatement st = conn.prepareStatement(sql);
        st.setObject(1,"506");
        st.setObject(2,9900);
        ResultSet rs = st.executeQuery();
        while (rs.next()) {
            String employeeid = rs.getString("employeeid");
            System.out.print(employeeid+",");
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

In the preceding code, data source information is put in a URL. But there isn’t any translation process. Then how does the SQL statement is translated? There seems a bit of magic in it.

The key lies in SPL’s JDBC gateway. A SPL script is preconfigured, and all SQL statements to be executed via JDBC will be given to this script to handle. That means SQL translation and execution are both performed in the SPL script.

To use the JDBC gateway, configure a SPL script at JDBC node in raqsoftConfig.xml. To configure gateway.splx, for example:

    <JDBC>
		<load>Runtime,Server</load>
		<gateway>gateway.splx</gateway>
    </JDBC>

The gateway script needs two parameters. Parameter sql receives the to-be-translated SQL statement, and parameter args receives parameters used in the SQL query, which are those passed to SQL from JDBC.

..

Remember to check “The last parameter is a dynamic parameter” option to make sure the multiple parameters used in the SQL statement can be received.

Here is the SPL script:


A

B

1

=sql=trim(sql).sqltranslate(dbType)


2

=argsN=args.len()

=("sql"|argsN.("args("/~/")")).concat@c()

3

=connect(dbName)


4

if pos@hc(sql,"select")

return A3.query@x(${B2})

5

else

=A3.execute(${B2})

6


>A3.close()

A1 directly reads the task variable dbType (passed through the URL). B2 concatenates the parameters into a string according to the length of parameters. For example, B2 gets the result below when there are two parameters:

sql,args(1),args(2)

A3 connects to the data source (whose information is also stored in a task variable). The data source is already configured in raqsoftConfig.xml, where you add a DB node to configure the data source connection information. Do the configurations one by one if multiple data sources are needed.

           <DB name="MYDATASOURCE">
                <property name="url" value="jdbc:mysql://127.0.0.1:3306/mydb?useCursorFetch=true"/>
                <property name="driver" value="com.mysql.jdbc.Driver"/>
                <property name="type" value="10"/>
                <property name="user" value="root"/>
                <property name="password" value="root"/>
                <property name="batchSize" value="0"/>
                <property name="autoConnect" value="false"/>
                <property name="useSchema" value="false"/>
                <property name="addTilde" value="false"/>
                <property name="caseSentence" value="false"/>
            </DB>

A5 judges whether the SQL to be translated is a select statement. As DQL and DML statements are executed differently and return different type of results, they need to be handled differently so that all SQL can be translated and executed.

If it is a select statement, B4 will use db.query() function to perform the query and get result; @x option closes the database connection after querying is finished. The SPL macro is also used here. Below is the code replaced by macro:

A4.query@x(sql,args(1),args(2))

If it is a non-select statement, B5 will use db.execute() function to execute the SQL statement.

The script is simple. And no need to restart the application if there are any modifications to the script, because SPL is interpreted execution and supports hot-switch.

DML statements, such as update, can also be executed using this gateway script.

public static void main(String[] args) {
    String driver = "com.esproc.jdbc.InternalDriver";
    String url="jdbc:esproc:local://?jobVars=dbType:"+info.get("dbType")+",dbName:"+info.get("dbName");
    try {
        Class.forName(driver);
        Connection conn = DriverManager.getConnection(url);
        String sql = "update orders set customername = ? where orderid = ? ";
        PreparedStatement st = conn.prepareStatement(sql);
        st.setObject(1,"PTCAG001");
        st.setObject(2,"1");
        st.executeUpdate();
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Let’s execute the update statement in the application and look at what happens. We find that it is translated to the counterpart of the target database and that data is successfully updated. This means all SQL statements can be seamlessly migrated.

This method can also be applied to situations involving multiple databases.

Those are the SPL practices of achieving multisource mixed computations. Indeed, we do not enumerate all possible scenarios. But after understanding how SPL works and mastering how to connect to different data sources, you can handle the multisource computations as you do with the single-source ones.