Personal tools
You are here: Home Research Trends & Opportunities New Media and New Digital Economy Data Science and Analytics Programming Models for Big Data

Programming Models for Big Data

Washington State_111220A
[Washington State - Forbes]


- Overview

Big data projects require coding. Programming languages for big data include: Java, R, C++, Python. Python is a popular choice for developers. 

A programming model is an abstraction or existing machine or infrastructure. It is a set of abstract runtime libraries and programming languages that form a computing model. This level of abstraction can be low-level, like machine language in computers. Or a very high level programming language like Java.

So we can say that if the infrastructure supporting big data analysis is the distributed file system we mentioned, then the programming model of big data should support the programmability of operations in the distributed file system. By this we mean being able to use big data to write computer programs that work efficiently on top of a distributed file system and deal with all potential problems with ease.

The big data programming model represents a programming style and provides an interface paradigm for developers to write big data applications and programs. The programming model is usually the core feature of the big data framework, which implicitly affects the execution model of the big data processing engine, and also drives the way users express and build big data applications.


- Programming Languages for Big Data

Big data is the collection of large and complex data sets, and programmers need advanced data processing software tools to analyze these data. With the help of big data, businesses can gain valuable information about customers or market trends to make profitable business decisions. Understanding big data programming languages can help you understand how technologists use them to retrieve, organize, store, and update vast amounts of data in databases.

Programming languages, like spoken languages, have their own unique structure, format, and flow. While spoken language is often determined by geography, programming language use is more dependent on coder preference, IT culture, and business goals.  

There are many, many programming languages used for various purposes today, but when it comes to big data, you'll see four languages that stand out the most: Python, R, Java, and Scala. Some of these languages are better suited for large-scale analytical tasks, while others are good at dealing with big data and the Internet of Things.  
  • Computer programming language R: R is a free software environment for statistical computing and graphics. It compiles and runs on various UNIX platforms, Windows and MacOS. 
  • Computer programming language Scala: Scala combines object-oriented and functional programming in one concise high-level language. Scala's static typing helps avoid bugs in complex applications, and its JVM and JavaScript runtimes let you build high-performance systems with easy access to a vast ecosystem of libraries.

- Big Data Programming Models

Today, data flows are abundant thanks to emerging technologies such as cloud computing, edge computing, and the Internet of Things (IoT). Many industries, including healthcare, government, media and entertainment, manufacturing, and the Internet of Things, generate vast amounts of data every day.

These industries require analytical models to improve the efficiency of their operations. The data generated by these industries is called big data because it is not only large in volume, but also fast in speed and in various formats. Organizations such as McDonald's, Amazon, and Walmart are investing in big data applications to examine large data sets to reveal hidden patterns, unknown correlations, market trends, customer preferences, and other useful business information.

In big data programming, data-driven parallel programs are written by users for execution in large-scale and distributed environments. There are many programming models for big data, with different focuses and advantages. Software developers use programming models to build applications.

Regardless of the programming language and supported application programming interface (API), the programming model bridges the underlying hardware architecture with the software. IoT applications such as smart homes, wearable devices, and smart cities generate large amounts of data to process. Analyzing such a large amount of data is a major challenge. 


[Grindelwald, Switzerland]

- The Requirements for Programming Models

Disruptive technologies such as cloud computing, blockchain, distributed machine learning, artificial intelligence, and deep learning are used in almost every application field of today's computing systems. This greatly generates large volumes of data with varying computational requirements.

What are the requirements for a big data programming model?

  • First, such a big data programming model should support common big data operations, such as splitting large amounts of data. This means partitioning the data into and out of computer memory, and then synchronizing the model of the dataset.
  • Access to data should be achieved in a fast manner. It should allow fast distribution to nodes within the rack, which may be data nodes to which we move computation. This means scheduling many parallel tasks at once.
  • It should also enable reliability and fault tolerance of computation. This means it should enable programmable replication and file restoration when needed.
  • It should be easily extensible to generate distributed data annotations. It should also be able to add new resources to take advantage of distributed computers and scale to more or faster data without loss of performance. This is called scaling out if needed.
  • Since there are many different types of data, such as documents, graphs, tables, key-values, etc., the programming model should be able to operate on specific collections of these types. Not every type of data can be supported by a particular model, but a model should be optimized for at least one type.


- Programming Models and Systems for Big Data Analysis

Big data analytics refers to advanced and efficient data mining and machine learning techniques applied to large amounts of data. Research work and results in the field of big data analysis are constantly emerging, and more and more new efficient architectures, programming models, systems, and data mining algorithms have been proposed.

The Big Data programming model represents a programming style that provides an interface paradigm for developers writing Big Data applications and programs. Programming models are often a core feature of big data frameworks because they implicitly influence the execution model of big data processing engines and also drive the way users express and build big data applications and programs.

Currently the most popular big data analysis programming models are:  

  • MapReduce
  • Functional Programming
  • Actor Model
  • Statistical and Analytical
  • Dataflow-Based
  • Bulk Synchronous Parallel
  • High Level DSL

These systems are compared using four taxonomy criteria (namely, level of abstraction, type of parallelism, infrastructure size, and application category) to help developers and users identify and select the best solution based on their skills, hardware availability, productivity, and application needs. 


- Programming Models and Algorithms for Big Data

Efficient big data management is the grand vision of modern computing as it empowers millions of intelligent, connected devices that can communicate with each other and increasingly control our world. 

Big data analytics is not a single computing paradigm; rather, it serves as an enabling computing technology for various industries such as smart cities, transportation, intelligent systems, energy management systems, healthcare applications, and more. Technically, Electronic Health Records (EHR) and eHealth applications are considered as one of the potential examples of Big Data applications, which generate large amounts of data per second and, if processed efficiently, can control the full functionality of electronic devices. health service. 

Therefore, to cope with the growing demand for big data innovation, global data scientists should start to pay attention to advanced big data programming models and algorithms that can quickly learn and automate the big data analysis process in real-time data-intensive applications.

In addition, it should facilitate effective communication, forecasting and decision-making processes. Although the existing big data analysis methods can perform reasonable calculations, the improvement of the current technology application work efficiency has greatly reduced the operability of the traditional big data programming model, and it is necessary to improve the security function.

Exploring high-level programming models for big data is the only way to properly handle these massive amounts of data. Here are some topics on innovative programming models and algorithms for big data: 

  • Energy-efficient programming and computing models for IoT-related big data applications
  • Programming Model and Algorithm Development of Big Data Open Platform
  • Big Data Innovative Programming Models and Algorithms Beyond Hadoop/Map Reduce
  • Efficient Big Data Programming Models and Algorithms for Big Data Search
  • Programming model and algorithm for big data visualization analysis and application
  • A Programming Model for Big Data Assisted Linking and Graph Mining
  • Programming Model and Algorithm for Big Data Mining Based on Semantics
  • Secure big data analytics and algorithmic models for privacy-preserving data-intensive applications
  • Algorithm and efficient programming model of multimedia big data analysis and management process
  • New and innovative big data computing model
  • A high-performance/parallel computing-assisted programming model for big data



[More to come ...]

Document Actions