"Today, the data sources of enterprises have evolved from 'a few tables' to a diverse range such .."

lisongbo RaqForum 82 No.
1 Reply • 13 View • 2 Months ago

This Is Likely the Computing Technology that Supports the Most Data Sources

Today, the data sources of enterprises have evolved from "a few tables" to a diverse range such as databases, files, APIs, streaming data, object storage and NoSQL. The ability to handle “multi-source computation” has become one of the critical criteria for data processing technologies.

When it comes to multi-source computation, the ‘logical data warehouse’ is probably the most mainstream approach. And it sounds very appealing: no need to synchronize data in advance, no need to struggle with traditional ETL, and the ability to perform cross-database queries using SQL.

However, reality falls short of ideals. These logical data warehouses, despite claiming universal connectivity, end up supporting only a handful of mainstream relational databases and a few file formats. They immediately falter for slightly less common data sources. Want to tackle cross-source computation? Prepare for hours of configuration, inevitable failures, and you’ll still end up resorting to Python.

esProc: A technology that truly gets ‘multi-data source support’ right

Compared to the ‘heavyweight’ complex modeling approach of logical data warehouses, esProc adopts a ‘lightweight’ approach: no modeling, and it doesn’t hide data sources. It explicitly states: “I can connect to any data source, directly read the raw data, and then use a unified scripting language, SPL, for cross-source computation.

It sounds simple, but the effect is powerful.

The extensive data source support isn’t just a boast

esProc boasts extensive connectivity capabilities, natively supporting nearly all mainstream data sources, categorized by type:

· Relational databases: Oracle, MySQL, SQL Server, PostgreSQL, SQLite, DB2… Basically, any database that supports JDBC can be used directly.

· Non-relational databases: MongoDB, Redis, etc., enabling data manipulation even without relational models.

· Various file formats: CSV, Excel, JSON, XML, Parquet, ORC – capable of reading data regardless of whether the format is regular or not.

· Message queues and streaming data: Direct integration with Kafka, handling streaming data with ease.

· Big data platforms and object storage: Hive, HDFS, S3, etc.

· HTTP / Web APIs: support JSON/ XML format parsing and retrieve data directly from APIs.

· Other less common data sources: ready-to-use connectors are available for Elasticsearch, InfluxDB, etc.

Compared to technologies that claim multi-source support, esProc not only has a longer list of supported sources, but it’s also easier to use.

Can connect, can cross-source compute, & cool syntax

Many technologies offer connectivity but lack true computation capabilities. For example, in some logical data warehouse systems, even after a connector is connected, cross-database associations require writing massive SQL subqueries – and even then, they may not work. While some embedded databases, such as DuckDB, can query CSV and Parquet files, connecting them to other databases is a challenge, and they lack support for widely-used enterprise databases like Oracle and SQL Server. As for Python, although it has a vast number of libraries, their usage is a mess, requiring you to learn each one individually, making a unified interface absolutely impossible.

esProc, in contrast, is much simpler:

For example, to perform a cross-database association between MySQL and Oracle:

	A
1	=oracle.query("select EId,Name from employees")
2	=mysql.query("select SellerId, sum(Amount) subtotal from Orders group by SellerId")
3	=join(A1:O,SellerId; A2:E,EId)

Or, to perform cross-source queries between MongoDB and MySQL – even with more complex structures – esProc handles it easily:

	A
1	=connect("mysql")
2	=A1.query@x("SELECT o.order_id, o.user_id, o.order_date, oi.product_id, oi.quantity, oi.price FROM orders o JOIN order_items oi ON o.order_id = oi.order_id WHERE o.order_date >= CURDATE()- INTERVAL 1 MONTH")
3	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
4	=mongo_shell@d(A3, "{'find':'products', 'filter': { 'category': {'$in': ['Tablets', 'Wearables', 'Audio'] } }}” )
5	=A2.join@i(product_id,A4:product_id,name,brand,category,attributes)
6	=A5.groups(category;sum(price*quantity):amount)

Using the native syntax of each data source not only simplifies operations but also fully preserves their inherent advantages.

For users accustomed to SQL, esProc thoughtfully offers SQL support. SQL can be used directly for simple tasks, while complex tasks can be handled by switching back to SPL. This blended approach offers greater flexibility.

Extending new data sources: easy to do yourself

What truly sets esProc apart is its remarkably lightweight approach to extending data sources.

To extend a new data source for a logical data warehouse, it needs to develop a dedicated connector. This is not as simple as just writing a configuration file; it requires a deep understanding of the target data source’s access mechanisms, query language, and data format, and then embedding it in the logical data warehouse framework for connection, parsing, and transformation, resulting in extremely high complexity. Consequently, it’s almost impossible for users to extend the system themselves and they must rely on vendor support.

esProc, in contrast, employs a unified native interface + simple encapsulation approach:

· For relational databases, esProc directly uses JDBC, which is ready to use.

· For non-relational data sources like MongoDB and Kafka, esProc provides only a light encapsulation based on official drivers.

Moreover, esProc offers an extension interface mechanism that allows you to create connector-level functionality without requiring expertise in the underlying architecture, as long as you understand how to read and format data from the target source. This means that even though esProc already supports many data sources, you can easily add support for even the most unusual ones yourself.

Not pursuing “transparency,” but higher flexibility

Logical data warehouses prioritize “transparency”: you write a single SQL query, and the underlying system automatically calls data from multiple sources. However, this “transparency” comes at a significant cost. The connector must be extraordinarily robust, and the system often falters once the data structure is complex or irregular.

esProc emphasizes an “explicit” approach: you connect directly to each data source, retrieve the data yourself, and then process it uniformly using SPL. Although less “elegant,” this approach is far more flexible, making it particularly suited for handling unstructured, semi-structured, and even dynamically structured data, and it easily handles complex logic as well.

Although transparency has its advantages, it restricts extensibility. SPL, in contrast, prioritizes both flexibility and extensibility.

Seamless integration with mainstream application systems

Some calculation engines, despite being easy to use, are difficult to embed. For example, no matter how many Python libraries exist, they’re always “outsiders” in Java-dominated enterprise systems, leading to inconvenient calling.

esProc, in contrast, is developed purely in Java, offering flexible deployment. It can be embedded as an in-memory calculation engine or operate independently as a microservice. Being easily integrated or run independently as required, it is extremely friendly to enterprise systems.

“Many” isn’t just about greater variety, but also about lightweight extensibility and a satisfying user experience

Ultimately, supporting many data sources is not simply about “quantity.” The key is:

· esProc natively supports a wide variety of data sources, covering a broad range.

· esProc offers powerful extensibility, with light encapsulation and a low barrier to entry.

· esProc offers unified syntax, clear logic, and supports working with SQL.

· esProc is suitable for various structured data formats, providing both flexibility and efficiency.

· esProc integrates smoothly with mainstream systems, causing no problems.

These factors, taken together, enable the statement:

esProc is likely the computing technology that currently supports the most data sources.

esProc not only supports many data sources but also provides high-quality support, quick extensibility, and a user-friendly experience.

SPL Official Website 👉 https://www.esproc.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.esproc.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/sxd59A8F2W

Youtube 👉 https://www.youtube.com/@esProc_SPL

Promote

lisongbo • 13 View • 2 Months ago