Data Oriented Computing: Data vs Models vs Codes

Jingdong Sun
7 min readJun 8, 2023

Traditionally, IT addresses business cases by utilizing software engineering and database queries. Over the past decades, object-oriented languages such as C++, Java, and structured query language (SQL) have gained popularity. IT teams use these languages to design objects, define their properties, functions, and methods, and manage object relationships to address business cases effectively.

Recently, there are two changes to IT world:

  1. As data becoming central to almost all aspects of business, to meet different requirements, many different kinds of data stores, schemas, and queries have emerged for effectively extracting insights from data. As a result, data-oriented computing has gained significant attention and become a buzzword.
  2. AI/ML technologies advance rapidly. Their usage and discussions have become prevalent across various domains. The public trial of ChatGPT further accelerated this trend. Today, it seems that ML has become an integral part of almost every IT solution, and people strive to incorporate ML models into their applications. It feels like no application is complete without the integration of ML models.

I posted several blogs about design and architecture of applications or solutions with data oriented and ML technologies seamlessly integrated:

  1. Data Oriented Computing — Architecture Patterns
  2. Case Study: Hybrid Architecture with Data Fabric and Data Mesh
  3. Architecture Considerations When Creating Software Solutions

However, the question of whether it is always better to use ML models instead of traditional code logic is an important and thought-provoking topic, which worth more discussion.

What ChatGPT Says

Well, just to feel cool, let us ask ChatGPT.

So if you ask best business scenarios ML models can be used to resolve, ChatGPT can list some for you:

And, some scenarios better using software engineering:

Upon reviewing above lists, it does not compare apple to apple, nor gave any details. Some special cases from the first list (better with ML models) may fit well using software engineering approach. For example, an application to help streamline a simple supply chain with limited steps. Some items in second list (better with Software engineering) are not business scenarios, but reasons about using software engineering, for example “debugging and maintenance”, “cost effectiveness”, “limited hardware resources”.

In order to best discuss the topic, let us first study some facts of data queries, ML models, and software engineering.

Some Facts of Data queries, ML models, and Software engineering

About business and use cases to resolve

Data queries, by themselves, often have limitations. A query language, for example SQL, is deigned to work on a specific kind of data store or data base, but not for difference kinds of data sources. So data queries are typically handle business case that just need to work on a single type of database, or single data file (excel tables). In such scenarios, where the data requirements are relatively straightforward and confined to a single source, data queries can be effective to extract and manipulate the required information.

Recent, there are some new data storage technologies to meet business data analytics and analysis needs:

  1. Data lakehouse, combine advantages of data lake and warehouse, Its query will be more powerful to help extend some business cases.
  2. Knowledge graph and its query language, knowledge graph is very close to real world data relationships, its query can generate more useful results to end users.
  3. Vector database, which tightly integrated with ML generative models. Here are some good introductions from business and technology points.

Software engineering is powerful. With various software engineering languages and tools, we can do almost all data operations from different data sources, including transform data into desired formats and structures for analysis, and generating visualized graphs for end users. However, software engineering have its limitation, due to its reliance on logic derived from computing languages.

If there are unlimited (or huge amount of) data variations, it becomes impossible to implement them solely through computing code logic. In this case, ML models can come to the rescue.

Related to business result reliability:

Data queries and software engineering are based on logic and can have clear, predictable results.

Machine learning is rooted in the field of statistics and mathematical optimization, rather than explicitly programmed logic. While statistical learning theory has progressed significantly, including concepts such as bias-variance tradeoff, overfitting, generalization, and model selection, ML model accuracy and reliability still a general concern, making responsible, transparent and explainable AI/ML a must to market.

I read a blog recently and the author’s word can be an example of what I mean here for difference of data querying and machine learning:

Cited from https://medium.com/@dallemang/ais-woolf-at-the-door-llms-and-knowledge-graphs-eecd6289380f

About the solution life cycle and cost:

Generally, data queries offer a quick implementation approach to generate business results. For example, utilizing a SQL database query, or creating macros for excel tables to generate business reports.

Software engineering applications need to go through a release cycle, which takes a longer time. Even with the adoption of agile, CICD, and DevOps automation in Cloud Native development, a typical software release cycle can be from hours to days or even weeks.

ML models heavily rely on high-quality training data and must go through the MLOps cycle, involving data engineering, model training and evaluation before deployment, and continuous monitoring to mitigate issues like data shift or bias. This cycle often takes long time. When people talk about machine learning, they generally think of “adding more data, increasing model size, and training for months” approach. While continuous learning can help ML models adapt to the product environment, I do not think this technology can mature quickly with the ML technology’s reliability and explainability concerns.

Some rules to consider

Based on above facts, here I come with some rules to help us:

  1. If a simple business case can be effectively addressed using data queries, using data queries for the solution.
  2. If accuracy and reliability are not major concerns, and/or your scenario involves significant data variations, using ML models can be a viable option.
  3. Otherwise, using software engineering.
  4. For complex business solutions, a common approach is to utilize software codes to integrate ML models, data queries, and business logic into the overall business flow. This integration allows for the seamless coordination and interaction between these components, enabling the solution to leverage the strengths of each approach.

Interested to try some cases?

Let us evaluate some real business scenarios using above rules of thumb:

Case 1:

A school maintains a relational database for storing student information. There is a requirement to generate a comprehensive school report for each student, including their overall GPA, the details of their semester classes and corresponding grades.

For this case, the SQL queries will be quick and good enough.

Case 2:

A company has multiple legacy data stores and intends to develop a virtualized layer on top of these data stores. This virtualized layer will provide a unified view of the data from these different stores, enable efficient and effective data analysis processes.

For this case, a software solution is best.

In most of business cases, we need to think of using data queries, software engineering and ML models together. Like the one below:

Case 3:

Based on market, an IT team plan to create an automated voice agent solution to help customers book hotel rooms.

The solution architecture can be as below:

Upon reviewing above flow, it appears that for simple scenarios involving limited text messages, Model 2 could be replaced with software engineering logic. If so, incorporating fuzzy search processing to map the end user’s text to specific database objects or queries becomes essential.

The use of ML models for fuzzy search has been a subject of discussion, as referenced here and here. In my opinion, when dealing with business cases with a wide range of data inputs and variations, using ML models for fuzzy search would likely yield better results. ML models have the ability to handle complex patterns and variations in textual data, enabling more accurate and efficient fuzzy search capabilities.

Discussions and Conclusions

As I have discussed thus far, it is commonly acknowledged that generative models like ChatGPT can generate data queries or codes. This leads to the question of whether ML models will ultimately replace data queries and software codes entirely?

I think the answer is no. While ML models can aid developers in accelerating the development phase by generating data queries or software codes, it does not eliminate the need for using data queries, software codes, or ML models in the actual solution. The discussion surrounding these components remains crucial, as the ultimate solution often involves seamlessly integrating them together to achieve the business outcome.

I am not against considering using AI/ML models first for business solutions, but want to point out that ML models are not a magic wand. Although machine learning provides powerful capabilities for addressing complex patterns and data-driven decision-making, it is not a one-size-fits-all solution. In some cases, traditional data queries or software engineering may be better suited to meet business requirements. It is essential to carefully assess the needs of each situation and choose the most appropriate approach accordingly.

In all cases, we need to be smart and innovative when designing a business solution. This involves considering all angles and seamlessly integrating multiple technologies. By leveraging the strengths of various technologies and carefully evaluating their suitability for the specific problem at hand, businesses can maximize its potential for success.

--

--