Data Oriented Computing — It is time for a new language

Jingdong Sun
5 min readApr 12, 2023

In recent years, there has been significant progress in computer science and technologies, particularly in the fields of data storage, data organization and governance, data analytics, artificial intelligence and machine learning.

However, after half century of progress, object-oriented programming languages still remain prevalent in the field of computer programming.

With data oriented computing, I think it is time to have a new language that can optimize data states, flows and operations — data oriented language.

Data Are Different From Objects

When implementing an software application or service using object-oriented programming languages, we generally create multiple object classes, as taught in Computer Science 201 at college, and use getter, setter, and/or operation methods/functions to fulfill our business logic and case support requirements.

Although data can be treated as objects, as currently supported in the market, there are many unique data features that cannot be easily implemented or supported using an object-oriented approach:

  1. Data directly connect business and technology. They need not only to be visualized with meaningful business insights, but also to go through transformations, operations, abstracting before generating meaningful business insights.
  2. Data operation can not just focus on better tools, better clouds, better infrastructures, but also better processes.
  3. AI and ML technologies are for data, not for objects.
  4. Data relationship is critical for business to delivery right data to right user at right time.
  5. Metadata (data of data) management.
  6. Data transformation and lineage.
  7. Data quality.
  8. Data integrity.

With these data features and ML models becoming increasingly prevalent in data operations, to enhance business case support efficiency, end-user and developer experience, a data-oriented programming language would be more beneficial in the market.

A Data Oriented Language

Let us assume that we create a data oriented language — language DOLe, as Data Oriented Language.

As a high-level programming language, besides basic types and operations, I/O, control structures support like other computer programming languages, DOLe shall focus on supporting:

  1. A Data type and extensions
  2. Features of data listed above
  3. Embedded ML models for data operations
  4. Data relations

More about language DOLe

In language DOLe:

Following image shows some more details of a data unit type and relations to peers:

This “Data” type is DOLe’s base data unit type, which can map to a node from a data structure we learned at college, or a tuple from relational database or time-series data, or any other unit concepts of data structures with current technologies.

Extension of this base type shall also be supported.

The base data unit type can be a structure including following parts:

  1. Data content, a blob, can be structured, unstructured, with any format like text, image, audio, video. “data format/schema” will be a metadata of data to help data operations.
  2. Versioning info: to maintain the history of data updates and transformations. Data lineage can be extracted from versioning history. Data integrity shall be enforced through all data versions.
  3. Metadata: all metadata related to this unit of data. Metadata will be version related, and different versions of data may have different metadata. Some metadata like data format/schema, governance policies, etc.
  4. Relations: refer to external relations/links among different data unit. This field will maintain multiple relations to multiple (external) data and significantly benefit the operations of a group of data — dataset.
  5. Operations: data operations are backed by ML models, statistics libraries. With language DOLe, end users (software engineers or data scientists) will not need to specifically train and execute a ML model, or use a specific library functionality when doing certain data operation like abstracting, transformation, extracting, prediction, etc. ML models and libraries will be naturally embedded within language DOLe, so end users just need to focus on data operations. Data operations can generate a new version of this data, or new data with relations of this one.

For a set of data (dataset), DOLe will support some operations like:

Data quality: Data itself can not be justified for quality, but need to be put into a certain context and environment, or evaluate among a set of data (anomaly detection). So data quality will be an operation within a certain set of data.

Business visualization: single unit of data generally can not give a meaningful insight to business, but a set of data can. For certain business case, some visualization approaches can give better insights than other approaches. This operation will give smart visualization support for the dataset based on running environments and business cases.

Metadata management: metadata related to this dataset. Metadata of a dataset and metadata of each data unit (of the dataset) can be related to benefit data and dataset maintenance and operations, for example data governance.

All these dataset operations are also backed by ML models and statistics libraries, so end users can focus on operations.

Summary

As data is becoming central to everything and connection to everything, it is time to consider developing a new language that can easily manage data features, optimize and simplify data, and streamline a range of data and dataset operations.

This blog just initialized this discussion, proposed a data-oriented language DOLe with its essential features.

All comments, proposal, suggestions are welcome. And if anyone interested with a joint work to put this language into a real open community project, please contact me.

https://github.com/jindong-ibm/DOLe

This is the last article in a series of five focused on data-oriented computing. Stay tuned for future installments on the revolutionary advancements in data-oriented computing and ML technologies:

  1. Data Oriented Computing: Scope and Mindset
  2. Data Oriented Computing: Operations
  3. Data Oriented Computing: Historical data vs Real-time data
  4. Data Oriented Computing: Architecture
  5. Data Oriented Computing: Languages

--

--