Personal tools

Data Warehouses

MIT
(Photo courtesy of MIT)

- Overview

Big data refers to datasets too large and complex for traditional databases, characterized by high volume, velocity, and variety, driven by sources like AI and IoT. 

While a data warehouse is a system for collecting and managing this data, big data is the datasets themselves, not a replacement for data warehouses but rather a new category of data that data warehouses are designed to handle. 

Data mining is a process of finding patterns within big data, distinct from data warehousing which is about storing and organizing it. 

 

1. Key Characteristics of Big Data: 

  • Volume: Refers to the immense size of the data, measured in terabytes, petabytes, or even larger scales.
  • Velocity: The speed at which new data is generated, collected, and processed, often requiring real-time analysis.
  • Variety: The diverse forms of data, including structured (like databases), semi-structured (like XML), and unstructured (like text, audio, images).

 

2. Big Data vs. Data Warehouse: Big Data: Is the actual data itself - the large, complex datasets that come from sources like sensors, social media, and IoT devices.

  • Data Warehouse: Is a system or repository designed to store and manage these large, multifaceted datasets for analysis and reporting.


3. Relationship Between Big Data and Data Warehousing: 
  • A data warehouse is a crucial tool for managing big data; it collects and stores the varied data from different sources.
  • Big data does not replace data warehousing, but rather expands its capabilities and requirements. The rise of big data necessitated the development of more flexible, scalable storage and processing solutions beyond traditional relational databases.


4. Data Mining in the Context of Big Data: 

  • Data mining is the process of analyzing big data to discover patterns, correlations, and insights that might be hidden within the massive datasets.
  • It is an essential component of a big data strategy, but it is distinct from the data warehouse, which serves as the collection and storage mechanism for the data that will be mined. 

 

 

[More to come ...]



   

 
Document Actions