Need:   In the data science, we are seeing the science part is getting better day by day due to developments in the algorithmic front and coming up various advanced and ensembled algorithms (Modeling centric) through which people are building models. Even though these models work well still they have problem of interpretability and generalization problem.

Examining a sample of recent publications revealed that 99% of the papers were model-centric with only 1% being data-centric. — Andrew Ng

Approach:

AI System = Model/Algorithm + Data

From the above we can understand that AI problem is a problem of both data and Algorithm, in which it will learn from the representation of the data and then optimize (Representation–>learning->optimization) and we have lot of algorithms (Model centric) essentially will work on the optimization process irretentive of the data quality and provide the solutions and these solutions will fail on the interpretability and sometimes generalization capability.

Now let’s turn the approach and make the algorithm simple, fixed and made the changes on the data based on the problem to improve the accuracy of the model (data Centric) by this we can your model accuracy will not sensitive to the data changes as well as it has clear expandability.

The below are fundamental difference between Model-Centric and Data-Centric.

What is a Model-Centric Approach?

Updates are made on the model itself for performance improvements. The focus is on finding the most suitable configuration by making improvements in the model architecture and training process—tuning hyperparameters, model weights, compression, and optimization, etc.

What is Data-Centric Approach?

Instead of focusing on the model itself, improvements are made to the dataset systematically to increase accuracy and other target metrics. This philosophical approach focuses on factors that affect label accuracy, precision, and quality in the dataset.

With Data Centered Artificial Intelligence (DCAI), we can make our AI systems more efficient and sustainable. In fact, as we said before, we can provide the generalization that we could not with a model-centric approach using higher-quality data practices.

 

The challenges Data-Centric Works

The key challenge here is to democratize data engineering, increasing reusability while accelerating the creation of sustainable and consistent datasets. And the following are the key elements that we should pay attention

  • Volume of Data
  • Consistency of the Data
  • Quality of the data

And all the above in this approach domain knowledge or data literacy  plays a vital role to understand the problem and the various relationships among the variables.

Print Friendly, PDF & Email
Data-Centric AI

Venugopal Manneni


A doctor in statistics from Osmania University. I have been working in the fields of Analytics and research for the last 15 years. My expertise is to architecting the solutions for the data driven problems using statistical methods, Machine Learning and deep learning algorithms for both structured and unstructured data. In these fields I’ve also published papers. I love to play cricket and badminton.


Post navigation