Data Oriented Computing — Scope and Mindset

Jingdong Sun
9 min readFeb 11, 2022

--

Going into 2022, we’ve seen many discussions, blogs, and papers on data, with topics from data storage to analytics, data fabric to data mesh, data transformation (ETL/ELT/reverseETL) to data lineage, static data catalog to dynamic, active metadata, and company strategies, operations and management.

Even with this breadth of information available, teams working on data architecture design to analyze data in order to generate business value for customer solutions will generally encounter more questions than answers. From my experience working on the Data Architecture Board of a domain specific open group, current technologies still lack the maturity needed to handle existing domain specific data and meet domain market requirements.

Whether or not we like to admit it, data is becoming central to everything. With machine learning tools and fast advanced technologies, the IT world needs a revolution and a new generation of computing — the key to unlock this is in data oriented computing.

But what is data oriented computing? Hasn’t computer science always been working on data since the beginning of computing history?

Yes, this is correct, as we see on Wikipedia’s definition of a computer:

a computer is a machine that manipulates data according to a set of instructions called a computer program

However, the focus of data oriented computing is on “oriented”. This necessitates a mindset, design, and operational shift, parallel to the emphasis of “oriented” from “Object oriented programming”, or “driven” from “Event driven architecture”. Data oriented computing is not simply about “manipulating” data, it is more about data centric/oriented/focused thinking, design, architecture, implementation, operation, and management.

This revolution in orientation needs to happen across all areas of computing. I’ll be detailing these changes across five installations, covering the following topics:

  1. Scope and Mindset — paying attention to the scope data affect and adjusting mindset.
  2. Operations — moving from Site Reliability Engineering (SRE) to Data Reliability Engineering (DRE)
  3. Historical data vs real-time data — discussing the position of static data and real-time data in the data oriented world
  4. Architecture — discussing architecture pattern changes
  5. Data oriented languages — describing features of a new, data-oriented language model

This article will focus on scope and mindset, with the following three topics to be covered in upcoming pieces.

To set the scene for scope and mindset, let us start with three recent business cases:

Three Business Cases

Case 1: Facebook (now Meta)

Facebook has encountered rough waters in the last several years, from controversy surrounding the 2016 U.S. Election and 2020 U.S. election (even after Facebook claimed that they were proactive for the 2020 U.S. election), to recent Facebook whistleblower stories.

What led to these controversial incidents, and what can we do to avoid similar instances in the future?

Let’s look at Facebook whistleblower Frances Haugen’s story:

Facebook says the work of Civic Integrity was distributed to other units. Haugen told us the root of Facebook’s problem is in a change that it made in 2018 to its algorithms — the programming that decides what you see on your Facebook news feed.

… …

Frances Haugen: And one of the consequences of how Facebook is picking out that content today is it is — optimizing for content that gets engagement, or reaction. But its own research is showing that content that is hateful, that is divisive, that is polarizing, it’s easier to inspire people to anger than it is to other emotions.

(Cited from https://www.cbsnews.com/news/facebook-whistleblower-frances-haugen-misinformation-public-60-minutes-2021-10-03/)

The documents paint a picture of a company that is often aware of the harms to which it contributes — but is either unwilling or unable to act against them.

(Cited from https://time.com/6121931/frances-haugen-facebook-whistleblower-profile/)

As we can see, Facebook/Meta used algorithms and machine learning models to “smartly” promote news feed content to its users. This allowed Facebook to increase engagement, ultimately leading to greater usage and profitability. Unfortunately, these actions ending up generating negative repercussions to the company and society.

Question 1: Even if Facebook will to fix this issue, can it be fully fixed?

Case 2: Jon the Robot

Artificial Intelligence Can Now Craft Original Jokes — And That’s No Laughing Matter (from Time Magazine, Jan 17th / Jan 24th 2022)

Jon the Robot, built by Naomi Fitter who is an assistant professor at Oregon State University, is powered by artificial intelligence to learn and tell jokes.

Can Jon the Robot tell jokes? Yes, it can learn jokes and share them with audiences. However, can it tell jokes like Jerry Seinfeld or Joan Rivers, adapting its repertoire based on different audiences, environments, cultures, or live reactions? Not successfully, it seems.

Humans have vast mental libraries of cultural references and linguistic nuances to draw upon when hearing or telling a joke. AI has access only to the information that humans choose to give it, which means that if we want an AI to make us laugh, we have to be clear about the kind of human we want to teach it.

Cited and Scans captured from Time Magazine Jan 17th / Jan 24th 2022 issue

Question 2: Assuming Jon the Robot is equipped with the most advanced deep learning technology, can it be a successful comedian?

Case 3: IBM Watson Health

In recent business news: IBM Watson Health was sold for parts, as Francisco Partners bought some of Watson’s data and analytics products.

At a glance, this may seem to be positive news. However, with full context, it no longer seems to be a success story. Back in 2011, when Watson beat former champions Ken Jennings and Brad Rudder on Jeopardy!, it became a household sensation around the world just over one night. Quickly, Watson Health rose to the center of the stage, with a target to transform the healthcare world, including personalizing patient care and treatment, and catalyzing drug development and clinical trials. However, after 10 years and billions invested in research and development, Watson Health’s chapter was closed with this sale.

Lizzie O’Leary recently discussed this turning point, stating:

That underscores the central theme of this story: When you try to combine the bravado of the tech culture and the notion that you can achieve these huge audacious goals in a domain where you’re dealing with people’s lives and health and the most sacrosanct aspects of their existence and their bodies, you need to have evidence to back up that you can do what you say you can do.

(https://slate.com/technology/2022/01/ibm-watson-health-failure-artificial-intelligence.html)

Question 3: Could any company successfully achieve the goals that Watson Health set out to do?

What do these three cases have in common?

In each of these three stories, companies had investment, data, ML models, software engineers working day and night, and data scientists crunching through lines of data. However, at the end of the day, none of these three cases were able to succeed in what they aimed to accomplish.

What does it take to succeed?

For my questions:

  1. Facebook/Meta: Even if Facebook would will to fix this issue, can it be fully fixed?
  2. Jon the Robot: Assuming Jon the Robot is equipped with the most advanced deep learning technology, can it be a successful comedian?
  3. IBM Watson Health: Could any company successfully achieve the goals that Watson Health set out to do?

My answer to all these questions: no and yes.

If we continue using data and technology as we are now, then the answer is NO.

How can we turn this into YES? This brings us to today’s topic: Data Oriented Computing — Scope and Mindset.

Scope and Mindset

In order to accomplish the desired goals of each of the three business cases, we need to change the scope and mindset of the way we approach computing.

The reason for this change is simple: as we see across the three stories, we are primarily dealing with data, not technology. So, our scope and mindset need to change accordingly from technology centric to data centric. This does not mean that technology is no longer important — it means that data is at least equally, if not more, important.

We see the importance of data in a lesson learned from IBM Watson Health case: With the data scope wrong, no matter how advanced the technology is, the project will not succeed.

If you think about it, knowing what we know now or what we’ve learned through this, the notion that you’re going to take an artificial intelligence tool, expose it to data on patients who were cared for on the upper east side of Manhattan, and then use that information and the insights derived from it to treat patients in China, is ridiculous. You need to have representative data. The data from New York is just not going to generalize to different kinds of patients all the way across the world.

Scope

When data becomes a center stage player, the project scope extends beyond computer science and data science. As we realize the integral role that data plays, we must incorporate new focuses into our scope, including economics, psychology, sociology, anthropology, genetics, culture, finance, marketing, medicine, health care, life science, political science, natural sciences, and more, since data touches these areas and affect these areas.

For example, the engineers developing Facebook’s news feed should have an understanding of human behavior, cultural nuances, and ethics. In order to design an algorithm that is beneficial to users and society, Facebook/Meta must have an understanding of how its users interact with the platform and how its content spreads beyond just the surface-level goal of increasing engagement. In order to do it right (or close to right), social media applications must have a holistic understanding of the scope their data affect and put these areas into their algrithm design.

Similar, for Watson Health case, in order to do it right (or close to right), engineers and data scientist should have knowledge with anthropology, genetics, medicine, health care, life science, and more.

With a move to data oriented computing, scope extends significantly to encompass much more than the baseline computer and data science.

Mindset

Mindset: the established set of attitudes held by someone (Google’s English dictionary).

Mindset is also important in the IT world. From software development agile process to recent discussion of data mesh, mindset plays an essential role in ensuring success of a business solution.

Most discussions and blogs on data centric mindsets or data driven mindsets focus on how humans or companies should make strategic decisions based on data analysis and interpretation.

What I want to emphasize is that as data becomes the center of world business, as we see in the three cases, should data analysis and interpretation fail to generate reliable results, the resulting decisions also fail to be reliable. Thus, mindset also needs to be adjusted in the following three areas:

  1. Mindset of data scope — As data are from and affect many areas, we can no longer only use a mindset of purely software engineering and data science technologies to resolve data problems. This will not provide the data needed to generate reliable results. When developing algorithms, ML models, software solutions, applications, and functional implementations, we need to broaden our scope to take all areas into consideration.
  2. Mindset of data limitation — We also need to understand what machines and data can do, and what they can not do. Personally, I believe machines can gain knowledge from data (machine learning), and do it in a way that is much faster and more comprehensive compared to humans. However, machines can never gain wisdom from data as humans can.
  3. Mindset of data focus — Mindset also needs to change for business solutions and applications: from application implementation, feature/function support, service APIs, functional calls to data oriented — data flow, data reliability, data quality, data security.

I recognize that readers may not agree with my perspectives, particularly on the mindset of data limitation. I welcome feedback and debate on this topic, whether through a comment on this article or a message on Linkedin.

Conclusions

In the computer science world, we too often focus on technology and miss the critical facts behind the scenes. As data increasingly becomes the center and driver of the world, we cannot rely on our old habits anymore. The data we see now transcends computer science and begins to encompass other disciplines such as behavior, economics, and medicine.

It is time for computing technology to embrace a new revolution. We must move from object oriented, event oriented, or even system/infrastructure/cloud oriented computing to data oriented computing.

In order to do this, we will need to extend our scope to include any area that data touches or impacts, and will need to adjust our mindset to focus on data, understand the scope that data affects and is affected by, and the limitations that our data have.

This is the first of five articles focused on data oriented computing. Stay tuned for the following installations on the revolution of data oriented computing technology:

  1. Data Oriented Computing: Scope and Mindset
  2. Data Oriented Computing: Operations
  3. Data Oriented Computing: Historical data vs Real-time data
  4. Data Oriented Computing: Architecture
  5. Data Oriented Computing: Languages

--

--