Personal tools

Supervised ML Methods

Supervised Learning_022023A
[Supervised Learning - NVIDIA]

- Overview

Supervised learning, also known as supervised machine learning, is a subcategory of machine learning (ML) and artificial intelligence (AI). It is defined as using a labeled dataset to train an algorithm to accurately classify data or predict an outcome. 

The supervised ML methods are used when you want to predict or explain the data you possess. The supervised ML techniques group and interpret data based only on input data. Supervised learning uses a training set to teach models to yield the desired output

Here are some examples of supervised learning algorithms: 

  • Naive Bayes: A supervised learning algorithm used for classification tasks. It uses features to make a prediction on a target variable.
  • Random forest: A flexible algorithm used for both classification and regression problems. It combines multiple classifiers to improve the performance of the model.


In supervised learning, the algorithm learns the relationship between input and output. The inputs are known as features or "X variables" and output is generally referred to as the target or "y variable". 

Supervised learning is different from reinforcement learning. Reinforcement learning describes a class of problems where an agent operates in an environment and must learn to operate using feedback. It does not require labeled data or a training set.

Supervised learning uses labeled input and output data, while unsupervised learning algorithms do not. In supervised learning, an algorithm "learns" from a training dataset by iteratively making predictions about the data and tweaking the correct answer.

Please refer to Wikipedia: Supervised Learning for more details.

 

- How Supervised ML Works

Supervised learning is a ML technique that is widely used in various fields such as finance, healthcare, marketing, and more. It is a form of ML in which the algorithm is trained on labeled data to make predictions or decisions based on the data inputs. 

In supervised learning, the algorithm learns a mapping between the input and output data. This mapping is learned from a labeled dataset, which consists of pairs of input and output data. The algorithm tries to learn the relationship between the input and output data so that it can make accurate predictions on new, unseen data.

Supervised learning uses a training set to teach the model to produce the desired output. This training dataset includes inputs and correct outputs, enabling the model to learn over time. The algorithm measures its accuracy through a loss function and adjusts until the error is sufficiently minimized. 

As input data is fed into the model, it adjusts its weights until the model is properly fitted as part of the cross-validation process. Supervised learning helps organizations solve various real-world problems at scale, such as sorting spam into folders separate from their inboxes.

If you learn a task under supervision, someone will be there to judge whether you got the answer correctly. Again, in supervised learning, this means having a full set of labeled data when training the algorithm. 

 

- Data Labeling and Fully Labeled Training Data

Data labeling for ML is the process of creating datasets for training ML models. In ML, data labeling is the process of identifying raw data (images, text files, videos, etc.) and adding one or more meaningful and informative labels to provide context so that ML models can learn from it. 

For example, tags might indicate whether a photo contains a bird or a car, which words were said in an audio recording, or whether an X-ray contained a tumor. Data labels are required for a variety of use cases including computer vision, natural language processing, and speech recognition.

Fully labeled means that every example in the training dataset is labeled with the answer the algorithm should come up with on its own. So a dataset of labeled flower images tells the model which photos are roses, daisies, and daffodils. When shown a new image, the model compares it to the training examples to predict the correct label. 

In supervised learning, the machine is taught by examples. An operator provides a ML algorithm with a known data set containing desired inputs and outputs, and the algorithm must find a way to determine how to obtain those inputs and outputs. 

When the operator knows the correct answer to a question, the algorithm identifies patterns in the data, learns from observations and makes predictions. The algorithm makes predictions and the operator makes corrections - a process that continues until the algorithm reaches a high level of accuracy/performance.

 

- Data Labeling, ML Models, and Model Training

Most practical ML models today use supervised learning, which applies an algorithm to map an input to an output. For supervised learning to work, you need a set of labeled data from which the model can learn to make correct decisions. 

Data labeling often begins by asking humans to make judgments given unlabeled data. For example, a tagger might be asked to tag all images in a dataset where "does the photo contain a bird" is true. Markings can be as coarse as a simple yes/no, or as fine as identifying specific pixels in an image that are associated with birds. 

ML models use human-supplied labels to learn latent patterns in a process called "model training." The result is a trained model that can be used to make predictions on new data.

In ML, the correctly labeled dataset that you use as an objective standard for training and evaluating a given model is often called the "ground truth". The accuracy of the trained model will depend on the accuracy of the ground truth, so it is critical to spend time and resources ensuring highly accurate data labeling.

 

- The Supervised ML Methods

The most popular supervised techniques are classification, regression, and prediction. For instance, the supervised ML techniques can be used to predict the number of new users who will sign up for the newsletter next month.

When it comes to data mining, supervised learning can be divided into two types of problems - classification and regression:

  • Classification uses an algorithm to accurately assign test data into specific classes. It identifies specific entities in a dataset and tries to draw some conclusions about how to label or define those entities. In classification tasks, machine learning programs must draw conclusions from observations and determine. What category the new observation falls into. For example, when filtering emails as "spam" or "not spam," the program must look at existing observation data and filter emails accordingly. Common classification algorithms are linear classifiers, support vector machines (SVM), decision trees, k-nearest neighbors, and random forests, which are described in more detail below.
  • Regression is used to understand the relationship between dependent and independent variables. It is often used to make predictions, such as the sales revenue of a given business. In regression tasks, machine learning programs must estimate and understand the relationship between variables. Regression analysis focuses on one dependent variable and a range of other variables that vary - making it especially useful for forecasting and forecasting. Linear regression, logistic regression, and polynomial regression are popular regression algorithms.
  • Forecasting: Forecasting is the process of making predictions about the future based on past and present data, often used to analyze trends.

 

- Supervised Learning Algorithms

Various algorithms and computing techniques are used in the supervised machine learning process. Below is a brief description of some of the most common learning methods, usually calculated using programs such as R or Python:

  • Neural Networks
  • Linear regression
  • Logistic regression
  • K-Nearest Neighbors
  • Random Forest


- Supervised Learning Challenges

Although supervised learning can bring advantages to enterprises, such as deep data insights and improved automation, there are still some challenges in building sustainable supervised learning models. 

Here are some of those challenges:

  • Supervised learning models may require a certain level of expertise to build accurately.
  • Training supervised learning models can be time-consuming.
  • Data sets have a higher chance of human error, causing the algorithm to learn incorrectly.
  • Unlike unsupervised learning models, supervised learning cannot cluster or classify data on its own.
 
 
 

[More to come ...]


Document Actions