Personal tools

Graph Computing for Big Data

Dubai Creek Tower_UAE_112320A
[Dubai Creek Tower, UAE - Civil Engineering Discoveries]


- Overview

Graph computing is a technology that studies, describes, analyzes, and computes graphs in the human world. Graphs are mathematical structures that model relationships and processes in physical, biological, social, and information systems. 

Graph analytics is a data analysis method that helps businesses understand the relationships between linked entities in a graph or network. 

A graph database is a platform for creating and manipulating data that has an associative and contextual nature. It uses a graph structure with nodes, edges, and properties to represent and store data.

Graph databases store nodes and relationships instead of tables or documents. Data is stored without restricting it to a pre-defined model, allowing a flexible way of thinking about and using it.

Graph databases have several advantages for big data analytics, including: 

  • Efficient relationship management: Graph databases can efficiently manage complex relationships.
  • Faster query performance: Graph databases have faster query performance than traditional databases.
  • New data attributes: Graphs are flexible and can add new data attributes, unlike traditional databases.
  • Batch processing: Distributed frameworks are needed for more complex analysis on a larger portion of a graph.


Graph Data Science is an analytics and machine learning (ML) solution that analyzes data relationships to improve predictions and discover insights. 

Gartner projects that graph technologies will be used in 80% of data and analytics innovations by 2025.

- Graph Databases

Graph databases are ideal for capturing and navigating complex data and relationships. They are used for social networks, recommendation engines, fraud detection, inventory management and many other modern systems.

Graph computing is a technology that studies, describes, analyzes, and computes graphs. A graph is a non-linear data structure made up of nodes, vertices, and edges. The edges connect any two nodes in the graph, and the nodes are also known as vertices. 

A graph database is "a database that uses a graph architecture for semantic querying, representing and storing data through nodes, edges, and attributes." Each graph database contains a number of objects. These objects are called vertices, and the relationship between these vertices is represented by an edge connecting two vertices. 

If our data model contains highly hierarchical many-to-many relationships with multiple roots, an odd number of levels, a different number of levels, or a cyclic relationship, we can say that our data model is a graph model.

Some advantages of graph databases include: 

  • Efficient relationship management: Graph databases can efficiently manage complex relationships.
  • Flexibility in how data is stored: Graph databases have a high level of flexibility in how data is stored.
  • Easy to scale: It's easy to add more data from additional sources as needed.

Some use cases for graph databases include: Fraud detection, Personalization, Customer 360, Knowledge graphs, Network management.


- Graph Analytics

Graph analytics is an emerging form of data analysis that helps businesses understand complex relationships between linked entity data in a graph. Graph analytics for big data is an alternative to the traditional data warehouse model. It is a framework for absorbing both structured and unstructured data from various sources to enable analysts to probe the data in an undirected manner. 

Graph analytics is becoming more popular because it can help organizations uncover insights that are impossible to discover using traditional techniques. 

Graph analytics can help organizations: 

  • Model complex systems: Graph analytics can model complex systems and relationships.
  • Make quicker decisions: Graph analytics can allow for quicker decision-making that includes mechanized decisions. 
  • Use graph models: Organizations can leverage graph models to gain insights that can be used in marketing or for example for analyzing social networks.
  • Uncover insights: Graph analytics techniques can uncover insights that would be impossible to discover using traditional techniques.
  • Identify important relationships: Graphs can help identify important relationships in large data sets with complex relationships between elements.


- Graph Computing and Graph Analytics

Graph computing is a technology that studies, analyzes, and computes graphs in the human world. Graph analytics is a way for organizations to quickly uncover insights from relationships between entities. Graph analytics has been useful for detecting financial crimes such as money laundering. 

Graph databases store data as nodes and relationships instead of tables or documents. The data is stored without a pre-defined model, which allows for a flexible way of thinking about and using it. 

The two popular models of graph databases are property graphs and RDF graphs. Property graphs focus on analytics and querying, while RDF graphs emphasize data integration. 

Examples of data that are well-suited to graphs include: 

  • Road networks
  • Communications networks
  • Social networks
  • Web pages and links
  • Financial transaction data


Techniques for graph analytics include: Path analytic, Connectivity analytic, Community analytic, Centrality analytic. 

Graph processing frameworks include: Apache Giraph, Spark GraphX, Graphlab


San Francisco_CA_011121A
[San Francisco, California - fitzsimonsphotography]

- Graph Processing in Big Data

Graph processing in big data refers to the process of storing data in graph databases and executing queries on that data. A graph processing framework (GPF) is a set of tools that process graphs. Graphs are non-linear data structures that consist of vertices and edges. Vertices are also called nodes. Edges are lines or arcs that connect two nodes in the graph. 

Graph analytics uses algorithms to explore the relationships between entries in a graph database. This can include connections between people, transactions, or organizations. Use cases include contact tracing, cybersecurity, drug interaction, recommendation engines, social networks, and supply chains. 

Processing extremely large graphs has been a challenge, but recent advances in big data technologies have made this task more practical.


- The Main Families of Big Data Analytics

Big data analytics is the process of collecting, processing, and analyzing large amounts of data to find insights and patterns. These data sets can come from many sources, including web, mobile, email, social media, and networked smart devices.

Big data analytics uses methods, tools, and applications to uncover trends, patterns, and correlations in data. These processes use familiar statistical analysis techniques, like clustering and regression, and apply them to more extensive datasets with the help of newer tools.

The four main types of big data analytics are: Descriptive, Diagnostic, Predictive, Prescriptive.

The four main types of big data analytics are: 

  • Predictive analytics: Makes predictions about events and risks that are uncertain
  • Descriptive analytics: Helps businesses understand changes in data over a specific period
  • Diagnostic analytics: Helps companies understand and forecast business performance
  • Prescriptive analytics: One of the four main types of big data analytics


Each type of analytics has a different purpose and offers varying levels of insight. Together, they help businesses to better understand their big data and make decisions to drive improved performance. 

Here are some examples of big data analytics:

  • Predictive analytics: Uses data analysis, machine learning, artificial intelligence, and statistical models to find patterns that might predict future behavior. For example, it can predict customer trends and market trends.
  • Data mining: Helps you examine large amounts of data to discover patterns in the data. This information can be used for further analysis to help answer complex business questions.
  • Data cleansing: The process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset.

[More to come ...]




Document Actions