Personal tools
You are here: Home Research Trends & Opportunities New Media and New Digital Economy Big Data Science and Analytics

Big Data Science and Analytics

(San Francisco, California, U.S.A. - Jeff M. Wang)


New Data Economy: Turning Big Data into Smart Data



- From Big Data To Knowledge


Big data refers to extremely large datasets that are difficult to analyze with traditional tools. It is often boiled down to a few varieties of data generated by machines, people, and organizations. When needs for data collection, processing, management, use, and analysis go beyond the capacity and capability of available methods and software systems. These constraints are often defined by volume, variety, velocity, veracity, etc.. Big Data can create efficient challenging solutions in health, security, government and more; and usher in a new era of analytics and decisions.

Big data is being generated by everything around us at all times. Every digital process and social media exchange produces it. Systems, sensors and mobile devices transmit it. Big data can be either structured, semi-structured, or unstructured. IDC estimates that 90 percent of big data is unstructured data. Many of the tools designed to analyze big data can handle unstructured data. The unstructured data usually refers to information that doesn't reside in a traditional row-column database. It is the opposite of structured data - the data stored in fields in a database.

Big data is arriving from multiple sources at an alarming velocity, volume and variety. To extract meaningful value from big data, you need optimal processing power, analytics capabilities and skills. In most business use cases, any single source of data on its own is not useful. Real value often comes from combining these streams of big data sources with each other and analyzing them to generate new insights. The organization that can quickly extract insight from their data AND leverage the data achieves an advantage. 

Analyzing large data sets, so-called big data, will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. Leaders in every sector will have to grapple with the implications of big data, not just a few data-oriented managers. The increasing volume and detail of information captured by enterprises, the rise of multimedia, social media, and the Internet of Things will fuel exponential growth in data for the foreseeable future.


The Big Data Life Cycle


(The Golden Gate Bridge, San Francisco, California - Jeff M. Wang)

Big data must pass through a series of steps before it generates value. Namely data access, storage, cleaning, and analysis. One approach to solve this problem is to run each stage as a different layer. And use tools available to fit the problem at hand, and scale analytical solutions to big data. 

The big data life cycle consists of four stages, namely: Data Acquisition, Data Awareness, Data Analytics & Data Governance.


- Data Acquisition

Data acquisition has been understood as the process of gathering, filtering, and cleaning data before the data is put in a data warehouse or any other storage solution. The acquisition of big data is most commonly governed by four of the Vs: volume, velocity, variety, and value. Most data acquisition scenarios assume high-volume, high-velocity, high-variety, but low-value data, making it important to have adaptable and time-efficient gathering, filtering, and cleaning algorithms that ensure that only the high-value fragments of the data are actually processed by the data-warehouse analysis. 


- Data Awareness

Data Awareness is the task of creating a scheme of relationships within a set of data, to allow different users of the data to determine a fluid yet valid context and utilise it for their desired tasks. It is a relatively new field, in which most of the work is currently being done on semantic structures to allow data to gain context in an interoperable format, in contrast to the current system where data is given context using unique, model specific constructs. (such as XML Schemes, etc.) 


- Data Processing and Analytics

Data Processing largely has three primary goals: a. determines if the data collected is internally consistent; b. make the data meaningful to other systems or users using either metaphors or analogy they can understand; and (what many consider most importantly) provide predictions about future events and behaviours based upon past data and trends. Being a very vast field with rapidly changing technologies governing its operation, this section will largely concentrate on the most commonly used technologies in data analytics. Data analytics requires four primary conditions to be met in order to carry out effective processing: fast, data loading, fast query processing, efficient utilisation of storage and adaptivity to dynamic workload patterns. The analytical model most commonly associated with meeting this criteria and with big data in general is MapReduce, detailed below. 


- Data Governance.

Data Governance is the act of managing raw big data as well as the processed information that arises from big data in order to meet legal, regulatory and business imposed requirements. While there is no standardized format for data governance, there have been increasing call with various sectors (especially healthcare) to create such a format to ensure reliable, secure and consistent big data utilisation across the board.


[More to come ...]




Document Actions