Personal tools

Machine Learning Work Flow

Machine Learning Workflow_121423A
[Machine Learning Workflow - MLOps]

 

- Overview

A machine learning (ML) workflow defines the stages implemented during a ML project. The core of the ML workflow is writing and executing ML algorithms to obtain ML models. 

A ML workflow is a systematic process that guides practitioners through the lifecycle of a ML project, from problem definition to solution deployment. It defines the phases that are implemented during the project, which typically include: data collection, data preprocessing, choosing a ML model, training the model, evaluating its performance, hyperparameter tuning, and finally deploying the model to make predictions. 

Essentially, gathering relevant data, preparing it for analysis, selecting the appropriate model, training it on the data, assessing its accuracy, optimizing settings, and then putting the model into use to make predictions on new data. 

ML requires experimenting with a wide range of datasets, data preparation steps, and algorithms to build a model that maximizes some target metric. 

Once you have built a model, you also need to deploy it to a production system, monitor its performance, and continuously retrain it on new data and compare with alternative models.  

 

- Why are ML Projects so Hard to Manage?

Being productive with ML can therefore be challenging for several reasons:  

  • It’s difficult to keep track of experiments. When you are just working with files on your laptop, or with an interactive notebook, how do you tell which data, code and parameters went into getting a particular result?  
  • It’s difficult to reproduce code. Even if you have meticulously tracked the code versions and parameters, you need to capture the whole environment (for example, library dependencies) to get the same result again. This is especially challenging if you want another data scientist to use your code, or if you want to run the same code at scale on another platform (for example, in the cloud).  
  • There’s no standard way to package and deploy models. Every data science team comes up with its own approach for each ML library that it uses, and the link between a model and the code and parameters that produced it is often lost.  
  • There’s no central store to manage models (their versions and stage transitions). A data science team creates many models. In absence of a central place to collaborate and manage model lifecycle, data science teams face challenges in how they manage models stages: from development to staging, and finally, to archiving or production, with respective versions, annotations, and history. 
 
Moreover, although individual ML libraries provide solutions to some of these problems (for example, model serving), to get the best result you usually want to try multiple ML libraries. 
 

- Challenges of ML Workflows

A ML workflow is a systematic process that defines the phases of a ML project, including developing, training, evaluating, and deploying ML models. 

The ML workflow can face many challenges, including:

  • Data quality and quantity: The amount and quality of data required can be a major challenge, especially for deep learning models that need large amounts of labeled or implicit feedback data.
  • Data collection: Collecting large amounts of data from multiple sources, such as social media, web scraping tools, and enterprise databases, can be difficult, especially for large datasets.
  • Model interpretability: Understanding how a model makes predictions is important, especially in applications with real-world consequences, like healthcare, finance, and autonomous vehicles.
  • Model selection: Choosing the right model can be difficult, but understanding each model's strengths and weaknesses can help make the best decision.
  • Data complexity: Data can be complex, with imbalanced datasets, unexpected noises, and redundancy. Well-developed approaches for curating datasets are needed to collect useful information.
  • Concept drift: Concept drift can negatively impact the value of a machine learning model, so it's important to address it when deploying models to ensure they remain accurate and reliable.

Other challenges include:  
  • Pay close attention to the training data: See how the algorithm misclassifies the training data. These are almost always mislabels or weird edge cases. Regardless, you really want to get to know them. Have everyone involved in building the model review the training data and label some of the training data themselves. For many use cases, it is unlikely that one model will perform better than two independent people can agree on.
  • Get something working end-to-end immediately, then improve one thing at a time: start with the simplest thing that might work, and then deploy it. You will learn a lot by doing this. Additional complexity at any stage of the process will always improve models in research papers, but rarely improve models in the real world. Justify every additional complexity. Putting something into the hands of the end user can help you understand how well the model is working early on, and can lead to critical issues, such as disagreements between what the model is optimizing for and what the end user wants. It may also cause you to re-evaluate the type of training data you are collecting. It's much better to catch these problems quickly.
  • Find elegant ways to handle inevitable algorithm failures: Almost all ML models will fail over a significant period of time, and how you handle this is absolutely critical. Models usually have reliable confidence scores that you can use. With batches, you can build human-computer interaction systems that send low-confidence predictions to operators, make the system work reliably end-to-end, and collect high-quality training data. For other use cases, you might be able to present low-confidence predictions in a way that flags potential errors or reduces end-user annoyance.

 

Saitama Prefecture_Japan_032221A
[Saitama Prefecture, Japan - Civil Engineering Discoveries]

- Best Practices for ML Workflows

Here are some Here are some best practices for machine learning (ML) workflows:

  • Define the project: Before starting, clearly define your project goals to ensure your models add value. Consider your current process, its goals, and what success looks like.
  • Data preparation: Collect relevant data from various sources, such as customer demographics, transactional data, website interactions, or social media data. Preprocess the data to ensure its quality and suitability for ML models, such as cleaning the data, handling missing values, and transforming the data into a format suitable for analysis.
  • Model development: Train an ML model on your data, evaluate model accuracy, and tune hyperparameters. You can use hyperparameter tuning techniques to improve model performance.
  • Model monitoring: Monitor the predictions on an ongoing basis. You can use skew and drift detection, fine tune alert thresholds, and use feature attributions to detect data drift or skew. You can also monitor dataset query times and storage capacity, and track performance and resource usage of your model endpoints.
  • Resource efficiency: Use computing platforms and cloud services for resource management to help increase the efficiency of ML workflows. You can rightsize CPU and GPU for performance and cost efficiency, and turn on automatic scaling.
  • Automation: Automate the process of hyperparameter tuning and parameter value selection to retain quality and provide deeper insights. You can also automate data processes such as training, evaluation, test, and deployment. 
  • Define the project: Before starting, clearly define your project goals to ensure your models add value. Consider your current process, its goals, and what success looks like.
  • Data preparation: Collect relevant data from various sources, such as customer demographics, transactional data, website interactions, or social media data. Preprocess the data to ensure its quality and suitability for ML models, such as cleaning the data, handling missing values, and transforming the data into a format suitable for analysis.
  • Model development: Train an ML model on your data, evaluate model accuracy, and tune hyperparameters. You can use hyperparameter tuning techniques to improve model performance.
  • Model monitoring: Monitor the predictions on an ongoing basis. You can use skew and drift detection, fine tune alert thresholds, and use feature attributions to detect data drift or skew. You can also monitor dataset query times and storage capacity, and track performance and resource usage of your model endpoints.
  • Resource efficiency: Use computing platforms and cloud services for resource management to help increase the efficiency of ML workflows. You can rightsize CPU and GPU for performance and cost efficiency, and turn on automatic scaling.
  • Automation: Automate the process of hyperparameter tuning and parameter value selection to retain quality and provide deeper insights. You can also automate data processes such as training, evaluation, test, and deployment. 
 
 
[More to come ...]


Document Actions