Personal tools

Building ML Models

Harvard (Charles River) IMG 7698
(Harvard University - Harvard Taiwan Student Association)

- Overview

Building a machine learning (ML) model is a multi-step process involving data collection and preparation, training, evaluation, and continuous iteration. 

Even for those with ML experience, building AI models can be complex and require diligence, experimentation, and creativity.  

But at a high level, the process of designing, deploying, and managing ML models often follows a common pattern. By understanding and following these steps, you will better understand the modeling process and best practices to guide your projects.


- Understand the Business Problem and Define Success Criteria

The first stage of any ML project is understanding the business needs: you need to know what the problem is before you try to solve it. 

First, work with the project owner to determine the goals and requirements of the project. The goal is to translate this knowledge into a problem definition suitable for a ML project and develop a preliminary plan to achieve project goals.  

Key questions to answer include: 

  • What are the business goals? What parts of achieving this goal require machine learning methods?
  • What are the heuristic options - in other words, quick and dirty methods that don't require machine learning - and how much better does the model need to be than the heuristic?
  • What type of algorithm is best suited to the problem at hand—such as classification, regression, or clustering?
  • Have the relevant teams addressed all necessary technical, business, and deployment issues?
  • What are the success criteria for project definition? How will the organization measure the benefits of the model?
  • How does the team phase in the project during an iteration sprint?
  • Are there requirements for transparency, explainability, or bias reduction?
  • What are the ethical considerations?
  • What are the acceptable parameters for accuracy, precision, and confusion matrix values?
  • What are the expected inputs and outputs?


Setting specific, quantifiable goals will help you achieve measurable ROI from your ML projects, rather than implementing a proof-of-concept that will later be discarded. 

Chart titled "Is Your Machine Learning Project Feasible or Not Feasible?" Three criteria: business feasibility, data feasibility, and implementation feasibility.  

These goals should be related to business goals, not just ML. While you can include typical ML metrics such as precision, accuracy, recall, and mean square error, it's critical to prioritize specific KPIs that are relevant to your business.


- Training A ML Model

Here are some steps for training a machine learning (ML) model:

  • Data preparation: Collect, clean, and organize data before using it to train the model. The quality of the data affects the accuracy of the model's predictions.
  • Training: Model training is a key step in the development process for ML algorithms. Data scientists use tools to find the best weights and biases for an algorithm to minimize its loss function.
  • Evaluation: Model evaluation is a key step in ML. It assesses the quality of the data and helps users trust the model to be used in a particular dataset.
  • Choose a model: Select the right model architecture and algorithms to solve the problem.
  • Prediction: Train the model iteratively on a data set. In each iteration, the model makes a prediction, checks if it's correct, and calibrates itself for wrong predictions.
  • Test the loaded model: Select the document sets to use to train the model and specify the percentage of documents to use as training data, test data, and blind data. Explore the performance metrics to identify ways to improve the model.

- Building A ML Model

Here are some more steps to building a ML model: 

  • Data collection: Gather and measure information on targeted variables in an established system.
  • Data preparation: Transform raw data so a ML algorithm can learn, discover insights, and make predictions from the datasets. Data preparation involves six steps: accessing, ingesting, cleansing, formatting, combining, and then analyzing the data.
  • Model evaluation: Provides an unbiased estimate of the model's ability to generalize to new, unseen data. The choice of evaluation metrics depends on the specific problem type.
  • Parameter tuning: Further testing to further improve the training in any way by trying more values and parameters.
  • Data preprocessing: An important step before applying ML methods for energy or load prediction. The common steps include data imputation, data resolution processing, data normalization, outlier detection and data smoothing.


Other steps for building a ML model include: 

  • Contextualizing ML in your organization
  • Exploring the data and choosing the type of algorithm
  • Preparing and cleaning the dataset
  • Splitting the prepared dataset and performing cross validation
  • Performing ML optimization
  • Deploying the model


[Funes, Dolomites, Italy - World Landscapes]

- Training and Evaluating A ML Model in Python 

Here are the steps on how to train and evaluate a model in Python: 

Step 1. Load the data
The first step is to load the data that you want to train the model on. You can use the pandas library to load the data into a DataFrame.

Step 2. Split the data into training and test sets

Once the data is loaded, you need to split it into training and test sets. The training set will be used to train the model, and the test set will be used to evaluate the model's performance. You can use the train_test_split() function from the scikit-learn library to split the data.

Step 3. Choose a model
Next, you need to choose a model that you want to train. There are many different models available, so you need to choose one that is appropriate for the task that you are trying to solve.

Step 4. Train the model
Once you have chosen a model, you need to train it on the training data. You can use the fit() method to train the model.

Step 5. Evaluate the model
Once the model is trained, you need to evaluate its performance on the test set. You can use the score() method to evaluate the model.

Step 6. Deploy the model
Once the model is trained and evaluated, you can deploy it to production. This means making the model available to users so that they can use it to make predictions.


[More to come ...]

Document Actions