ML Techniques (Supervised Learning, etc.)
Machine Learning:
Algorithms that parse data, learn from that data,
and then apply what they’ve learned to make informed decisions
- Overview
Machine intelligence is the last invention that humanity will ever need to make. If you could look back a couple of years ago at the state of AI and compare it with its current state, you would be shocked to find how exponentially it has grown over time. It has branched out into a variety of domains such as machine learning (ML), expert systems, natural language processing (NLP), and dozens more.
While the idea behind AI is to build smarter systems that think and execute on their own, they still need to be trained. The ML domain of AI has been created for the very exact purpose by bringing several algorithms, allowing for smoother data processing and decision-making.
- How Does Machine Learning Work?
The “learning” in ML refers to a process in which machines review existing data and learn new skills and knowledge from that data. ML systems use algorithms to find patterns in datasets, which might include structured data, unstructured textual data, numeric data, or even rich media like audio files, images and videos.
ML algorithms are computationally intensive, requiring specialized infrastructure to run at large scale.
Suppose we have to train a model that can recognize the given data (image) as a cat or a dog. We will use tag (definition) input. We each took thousands of photos of cats and dogs.
After that, we will do feature extraction. This means that we have to extract features (e.g. color, eyes, nose, ears, etc.) from the raw input that defines or can distinguish between cats and dogs.
- Machine Learning Algorithms and Models
Machine learning (ML) involves the use of ML algorithms and models. A ML algorithm, also called model, is a mathematical expression that represents data in the context of a problem, often a business problem. The aim is to go from data to insight.
- ML algorithms are procedures that are implemented in code and are run on data.
- ML models are output by algorithms and are comprised of model data and a prediction algorithm.
- ML algorithms provide a type of automatic programming where ML models represent the program.
For example, if an online retailer wants to anticipate sales for the next quarter, they might use a ML algorithm that predicts those sales based on past sales and other relevant data.
Similarly, a windmill manufacturer might visually monitor important equipment and feed the video data through algorithms trained to identify dangerous cracks.
- Four Types of ML Algorithms
Machine learning (ML) algorithms are classified into 4 types: Supervised Learning, Unsupervised Learning, Semi-supervised Learning, and Reinforcement Learning.
- Supervised Learning Algorithm: This algorithm consists of a target/outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables). Using these set of variables, we generate a function that map inputs to desired outputs. The training process continues until the model achieves a desired level of accuracy on the training data. Examples of Supervised Learning: Regression, Decision Tree, Random Forest, KNN, Logistic Regression etc.
- Unsupervised Learning Algorithm: In this algorithm, we do not have any target or outcome variable to predict / estimate. It is used for clustering population in different groups, which is widely used for segmenting customers in different groups for specific intervention. Examples of Unsupervised Learning: Apriori algorithm, K-means.
- Semi-supervised Learning Algorithm: Semi-supervised learning (SSL) is a learning problem that involves a small number of labeled examples and a large number of unlabeled examples. This type of learning problem is challenging because neither supervised nor unsupervised learning algorithms can effectively utilize the mixture of labeled and uninterpretable data. Therefore, specialized semi-supervised learning algorithms are required. Therefore, this is a learning problem between supervised and unsupervised learning. SSL is somewhere between supervised and unsupervised learning. In addition to unlabeled data, the algorithm also provides some supervision information - but not necessarily for all examples. Often this information will be the target associated with some samples. We need SSL algorithms when working with data where labeling examples is challenging or expensive.
- Reinforcement Learning Algorithm: Using this algorithm, the machine is trained to make specific decisions. It works this way: the machine is exposed to an environment where it trains itself continually using trial and error. This machine learns from past experience and tries to capture the best possible knowledge to make accurate business decisions. Example of Reinforcement Learning: Markov Decision Process.
Other machine learning techniques include:
- Clustering: A machine learning technique that groups similar data pieces into clusters. This helps machine learning engineers detect patterns and structures when working with unlabeled data.
- Decision tree: A popular method of machine learning that uses decision trees to create models that predict outcomes or classify data into categories.
- Dimensionality reduction: A technique used to reduce the number of features or dimensions in a dataset, while retaining as much information as possible.
- Ensemble methods: A technique that combines several base models to produce one optimal predictive model.
- Convolutional neural network: A widely used machine learning technique, mainly in vision-related applications.
- The Different Purposes of ML Algorithms and Models
There are different kinds of Machine Learning (ML), including supervised learning, unsupervised learning, deep and reinforcement learning. They are used for different purposes.
The purpose of supervised learning is to establish a relationship between two datasets and to use one dataset to forecast the other. The purpose of unsupervised learning is to try to understand the structure of data, and to identify the main drivers behind it. The purpose of deep learning is to use multi-layered neural networks to analyze a trend, while reinforcement learning encourages algorithms to explore and discover the best action to yield best results.
Supervised learning, in which we have examples in the data that have labels, and unsupervised learning, in which we have only features for those examples, but no labels. Reinforcement learning is characterized by an agent continuously interacting and learning from its stochastic environment, and in which an agent learns its behavior based on the feedback it receives from the environment in the form of a reward.
So in reinforcement learning, the agent can keep adapting its behavior as time goes by, based on its environment, to maximize this reward. Reinforcement learning is described as learning from delayed reward. The feedback in reinforcement learning may come several steps after the decisions that you've actually made.
- Loss Functions
Common machine learning methods fall into three types:
- Supervised learning, where the learning system learns latent mappings based on labeled examples,
- Unsupervised learning, where the learning system models the data distribution based on unlabeled examples,
- Reinforcement learning, train the decision system to make the best decisions.
From the designer's perspective, all kinds of learning are supervised by loss functions. The source of oversight must be defined by humans. One way is to use a loss function.
Loss functions in machine learning measure the difference between the actual output and the predicted output. This difference is also called a cost or loss. The loss function is an important part of the machine learning algorithm and helps optimize model performance.
The loss function tells the machine learning algorithm how well the trained system is currently performing. The goal of learning is to reduce the value of this loss function, which is to make our machine perform better.
- Supervised and unsupervised learning
There is a key difference between supervised and unsupervised learning. Supervised learning uses labeled datasets while unsupervised learning uses unlabeled datasets. By "labelled", we mean that the data has been labeled with the correct answer. Classification problems use algorithms to classify data into specific segments.
In supervised learning, you use well "labeled" data to train a machine. Unsupervised learning is a machine learning technique where you don't need a supervised model. Supervised learning allows you to collect data or generate data output based on previous experience. Unsupervised machine learning can help you discover unknown patterns in your data. Regression and classification are two types of supervised machine learning techniques. Clustering and association are two types of unsupervised learning. In a supervised learning model, input and output variables will be given, whereas in an unsupervised learning model, only input data will be given.
We apply supervised ML techniques when we have a piece of data that we want to predict or explain. We do so by using previous data of inputs and outputs to predict an output based on a new input.
For example, you could use supervised ML techniques to help a service business that wants to predict the number of new users who will sign up for the service next month. By contrast, unsupervised ML looks at ways to relate and group data points without the use of a target variable to predict.
In other words, it evaluates data in terms of traits and uses the traits to form clusters of items that are similar to one another. For example, you could use unsupervised learning techniques to help a retailer that wants to segment products with similar characteristics — without having to specify in advance which characteristics to use.
[More to come ...]