Personal tools

Data Integration

UChicago_DSC0282
(The University of Chicago - Alvin Wei-Cheng Wong)


- Overview

Data integration is the process of combining data from multiple sources into a single, unified view. This process can consolidate structured, unstructured, batch, and streaming data. Data integration is often used for machine learning (ML) and artificial intelligence (AI), which can help overcome challenges and harness the full potential of data assets. 

AI can integrate ML models and stream-processing technologies to enable organizations to gain instant insights and make data-driven decisions. For example, ML models can analyze large amounts of data and provide users with personalized recommendations, predictions, and insights. 

Data integration can involve:

  • Cleaning and transforming data
  • Resolving inconsistencies or conflicts that may exist between the different sources
  • Data warehousing
  • ETL (extract, transform, load) processes
  • Data federation

 

The three integration stages for ML are data acquisition, data understanding, and company acceptance. An ML model is only as good as the data being used to train it. Bad data is often referred to as “Garbage in, Garbage out”. 

Data integration is commonly used for: AI and ML, data lake development, cloud migration and database replication, IoT, and real-time intelligence.

 

 

[More to come ...]

Document Actions