Personal tools

Supervised ML Methods

Supervised Learning_022023A
[Supervised Learning - NVIDIA]

- Overview

Supervised learning, also known as supervised machine learning, is a subcategory of machine learning (ML) and artificial intelligence (AI). It is defined as using a labeled dataset to train an algorithm to accurately classify data or predict an outcome. 

Supervised learning is a method used in AI to teach computers how to understand and analyze data. It focuses on finding meaning in data by addressing specific questions.

In other words, instead of relying on pure logic, the computer algorithm learns from labeled data, which means data that has already been tagged with the correct answer or outcome.

The goal is to enable the algorithm to identify patterns and relationships within the data to accurately label new, unseen data.

This approach is especially effective for tasks like classifying things or predicting future values. For example, it can determine the category of a news article, automatically separate spam emails from your inbox, or estimate the sales volume for a specific future date.

The supervised ML methods are used when you want to predict or explain the data you possess. The supervised ML techniques group and interpret data based only on input data. Supervised learning uses a training set to teach models to yield the desired output

Supervised learning algorithms are used to make predictions and gain insights from data using labeled datasets. They are used in many fields, including healthcare, finance, marketing, and image recognition. 

Please refer to the following for more information:


- How Supervised ML Works

Supervised learning is a ML technique that is widely used in various fields such as finance, healthcare, marketing, and more. It is a form of ML in which the algorithm is trained on labeled data to make predictions or decisions based on the data inputs. 

In supervised learning, the algorithm learns a mapping between the input and output data. This mapping is learned from a labeled dataset, which consists of pairs of input and output data. The algorithm tries to learn the relationship between the input and output data so that it can make accurate predictions on new, unseen data.

Supervised learning uses a training set to teach the model to produce the desired output. This training dataset includes inputs and correct outputs, enabling the model to learn over time. The algorithm measures its accuracy through a loss function and adjusts until the error is sufficiently minimized. 

As input data is fed into the model, it adjusts its weights until the model is properly fitted as part of the cross-validation process. Supervised learning helps organizations solve various real-world problems at scale, such as sorting spam into folders separate from their inboxes.

If you learn a task under supervision, someone will be there to judge whether you got the answer correctly. Again, in supervised learning, this means having a full set of labeled data when training the algorithm. 


- Data Labeling and Fully Labeled Training Data

Data labeling for ML is the process of creating datasets for training ML models. In ML, data labeling is the process of identifying raw data (images, text files, videos, etc.) and adding one or more meaningful and informative labels to provide context so that ML models can learn from it. 

For example, tags might indicate whether a photo contains a bird or a car, which words were said in an audio recording, or whether an X-ray contained a tumor. Data labels are required for a variety of use cases including computer vision, natural language processing, and speech recognition.

Fully labeled means that every example in the training dataset is labeled with the answer the algorithm should come up with on its own. So a dataset of labeled flower images tells the model which photos are roses, daisies, and daffodils. When shown a new image, the model compares it to the training examples to predict the correct label. 

In supervised learning, the machine is taught by examples. An operator provides a ML algorithm with a known data set containing desired inputs and outputs, and the algorithm must find a way to determine how to obtain those inputs and outputs. 

When the operator knows the correct answer to a question, the algorithm identifies patterns in the data, learns from observations and makes predictions. The algorithm makes predictions and the operator makes corrections - a process that continues until the algorithm reaches a high level of accuracy/performance.


- Data Labeling, ML Models, and Model Training

Most practical ML models today use supervised learning, which applies an algorithm to map an input to an output. For supervised learning to work, you need a set of labeled data from which the model can learn to make correct decisions. 

Data labeling often begins by asking humans to make judgments given unlabeled data. For example, a tagger might be asked to tag all images in a dataset where "does the photo contain a bird" is true. Markings can be as coarse as a simple yes/no, or as fine as identifying specific pixels in an image that are associated with birds. 

ML models use human-supplied labels to learn latent patterns in a process called "model training." The result is a trained model that can be used to make predictions on new data.

In ML, the correctly labeled dataset that you use as an objective standard for training and evaluating a given model is often called the "ground truth". The accuracy of the trained model will depend on the accuracy of the ground truth, so it is critical to spend time and resources ensuring highly accurate data labeling.


- The Supervised ML Methods

Supervised learning methods are ML algorithms that use features to predict a target variable.

The most popular supervised techniques are classification, regression, and prediction. For instance, the supervised ML techniques can be used to predict the number of new users who will sign up for the newsletter next month.

When it comes to data mining, supervised learning can be divided into two types of problems - classification and regression:

  • Classification uses an algorithm to accurately assign test data into specific classes. It identifies specific entities in a dataset and tries to draw some conclusions about how to label or define those entities. In classification tasks, machine learning programs must draw conclusions from observations and determine. What category the new observation falls into. For example, when filtering emails as "spam" or "not spam," the program must look at existing observation data and filter emails accordingly. Common classification algorithms are linear classifiers, support vector machines (SVM), decision trees, k-nearest neighbors, and random forests, which are described in more detail below.
  • Regression is used to understand the relationship between dependent and independent variables. It is often used to make predictions, such as the sales revenue of a given business. In regression tasks, machine learning programs must estimate and understand the relationship between variables. Regression analysis focuses on one dependent variable and a range of other variables that vary - making it especially useful for forecasting and forecasting. Linear regression, logistic regression, and polynomial regression are popular regression algorithms.
  • Forecasting: Forecasting is the process of making predictions about the future based on past and present data, often used to analyze trends.


University of Chicago_050222A
[University of Chicago]

- Supervised Learning Algorithms

Various algorithms and computing techniques are used in the supervised machine learning process. Below is a brief description of some of the most common learning methods, usually calculated using programs such as R or Python:

Here are some examples of supervised learning algorithms: 

  • Decision tree: A predictive model that uses input variables to predict the value of a target. The tree's branches represent observations, and its leaves represent conclusions about the target.
  • Linear regression: Predicts a continuous response variable based on one or more predictor variables. The algorithm estimates a linear relationship between the variables and uses it to make predictions.
  • Logistic regression: A classification algorithm that predicts the probability of a target variable that has only two possible classes.
  • Random forest: Combines multiple algorithms of the same type, such as multiple simple trees, to make a "forest". This algorithm can help prevent over-fitting.
  • Naive Bayes: Also known as Naive Bayes Classifier, this algorithm is used for classification tasks and assumes that features are independent of each other.
  • Neural network: Inspired by how the brain functions, this algorithm takes input, passes it through a function, and produces an output.
  • Classification and Regression trees: Both Regression and Classification algorithms are supervised learning algorithms that work with labeled datasets.


In supervised learning, the algorithm learns the relationship between input and output. The inputs are known as features or "X variables" and output is generally referred to as the target or "y variable". 

Supervised learning is different from reinforcement learning. Reinforcement learning describes a class of problems where an agent operates in an environment and must learn to operate using feedback. It does not require labeled data or a training set.

Supervised learning uses labeled input and output data, while unsupervised learning algorithms do not. In supervised learning, an algorithm "learns" from a training dataset by iteratively making predictions about the data and tweaking the correct answer.


- Challenges of Supervised Learning Models

Although supervised learning can bring advantages to enterprises, such as deep data insights and improved automation, there are still some challenges in building sustainable supervised learning models. 

Supervised learning models can face many challenges, including:
  • Data quality: Poor quality data can make it difficult to choose the right algorithm.
  • Training data: Inconsistent, unclean, or insufficient training data can make machine learning algorithms less effective.
  • Time: Training supervised learning models can be time-consuming.
  • Human error: Datasets can contain human error, which can lead to incorrect algorithms.
  • Clustering and classification: Supervised learning models can't cluster or classify data on their own.
  • Expertise: Structuring supervised learning models accurately can require a certain level of expertise.


[More to come ...]

Document Actions