Personal tools

Machine Learning Work Flow

Machine Learning Workflow_121423A
[Machine Learning Workflow - MLOps]

 

- Overview

Machine learning (ML) enables computers to find patterns in data and then use those to make decisions rather than being explicitly programmed to carry out a certain task. 

ML is a branch of AI and computer science. It uses data and algorithms to replicate the human learning process and continuously improve its accuracy. ML is an important part of the ever-expanding field of data science. 

Through the use of statistical methods, teach algorithms to generate predictions and insights in data mining projects. These insights then influence application and business decisions and hopefully create business

The ML workflow is pretty simple: 

  • You have data which contains patterns.
  • You supply it to a ML algorithm which finds the patterns and generates a model.
  • The model recognises these patterns when presented with new data.
 

In recent years, data has become an important currency. This is because much valuable intelligence can be gleaned from the large data sets captured, which is used to make critical business decisions. 

But ML goes far beyond simply storing data. It's about capturing, preserving, accessing and transforming data to interpret it and find its meaning - and ultimately its value. 

ML algorithms are often developed using frameworks such as TensorFlow and PyTorch.

 

- The Machine Learning Workflow

A machine learning workflow defines the stages implemented during a machine learning project. The core of the ML workflow is writing and executing machine learning algorithms to obtain ML models. 

The ML workflow describes the steps of ML implementation. Typically, these stages include data collection, data preprocessing, data set construction, model training and evaluation, and finally deployment to production. 

Machine learning (ML) requires experimenting with a wide range of datasets, data preparation steps, and algorithms to build a model that maximizes some target metric. 

Once you have built a model, you also need to deploy it to a production system, monitor its performance, and continuously retrain it on new data and compare with alternative models.  

Being productive with machine learning can therefore be challenging for several reasons:  

  • It’s difficult to keep track of experiments. When you are just working with files on your laptop, or with an interactive notebook, how do you tell which data, code and parameters went into getting a particular result?  
  • It’s difficult to reproduce code. Even if you have meticulously tracked the code versions and parameters, you need to capture the whole environment (for example, library dependencies) to get the same result again. This is especially challenging if you want another data scientist to use your code, or if you want to run the same code at scale on another platform (for example, in the cloud).  
  • There’s no standard way to package and deploy models. Every data science team comes up with its own approach for each ML library that it uses, and the link between a model and the code and parameters that produced it is often lost.  
  • There’s no central store to manage models (their versions and stage transitions). A data science team creates many models. In absence of a central place to collaborate and manage model lifecycle, data science teams face challenges in how they manage models stages: from development to staging, and finally, to archiving or production, with respective versions, annotations, and history. 
 

Moreover, although individual ML libraries provide solutions to some of these problems (for example, model serving), to get the best result you usually want to try multiple ML libraries. 

MLflow is an open source platform for managing the end-to-end ML lifecycle. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps that other data scientists can use as a “black box,” without even having to know which library you are using.

 

 

[More to come ...]


Document Actions