Personal tools

Regression in Machine Learning

UChicago_DSC0321
(The University of Chicago -Alvin Wei-Cheng Wong)


- Overview

Machine learning (ML) regression is a technique for studying the relationship between an independent variable (or feature) and a dependent variable or outcome. It is used as a method of predictive modeling in machine learning, where algorithms are used to predict continuous outcomes. 

Here are some steps for solving ML regression problems:

  • Data cleaning
  • Separate qualitative and quantitative attributes
  • Feature selection on quantitative attributes
  • Encode qualitative attributes and perform feature selection


Regression algorithms help predict continuous variables like:

  • House prices
  • Market trends
  • Weather patterns
  • Oil and gas prices


Linear regression is a popular form of regression analysis because it's easy to use for predicting and forecasting. Polynomial regression is an extension of linear regression that models the relationship between variables using a polynomial equation. This allows for more flexibility in capturing nonlinear relationships. 

Here's an example of a regression task:

  • Predicting the price of a stock in the coming days, given the company's and the market's past history

 

- ML Regression

Regression finds correlations between dependent and independent variables. The task of the regression algorithm is to find the mapping function so that we can map the input variable of "x" to the continuous output variable of "y". 

Therefore, regression algorithms help in predicting continuous variables such as house prices, market trends, weather patterns, oil and gas prices, etc. 

ML regression is a technique that studies the relationship between independent variables and a dependent variable. It's used to predict continuous outcomes. 

Solving regression problems is one of the most common applications of ML models, especially in supervised ML. Algorithms are trained to understand the relationship between independent variables and an outcome or dependent variable. The model can then be leveraged to predict outcomes for new and unseen input data, or to fill gaps in missing data. 

Regression analysis is an integral part of any forecasting or predictive model and is therefore a common approach in ML-driven predictive analytics. Besides classification, regression is a common use of supervised ML models. This method of training a model requires labeled input and output training data. ML regression models require knowledge of the relationship between features and outcome variables, so accurately labeled training data is critical.

Regression is a key element of predictive modeling and as such can be found in many different applications of ML. Whether driving financial forecasts or forecasting healthcare trends, regression analysis can bring organizations critical insights for decision making. It has been used in different fields to predict housing prices, stock or share prices, or to map salary changes.

 

- Regression Analysis

In statistical modeling, regression analysis is a set of statistical procedures for estimating a dependent variable (often called an "outcome" or "response" variable, or "label" in machine learning terminology) and one or more independent variables ( Often referred to as "predictors", "covariates", "explanatory variables", or "features").

The most common form of regression analysis is linear regression, in which the closest straight line (or more complex linear combination) to the data is found according to specific mathematical criteria.

For example, ordinary least squares computes the unique line (or hyperplane) that minimizes the sum of squared differences between the real data and that line (or hyperplane).

For specific mathematical reasons (linear regression), this allows researchers to estimate the conditional expectation (or population mean) of the dependent variable when the independent variable takes a given set of values.

Less common forms of regression use slightly different procedures for estimating surrogate positional parameters (for example, quantile regression or necessary condition analysis) or for estimating conditional expectations for broader ensembles of nonlinear models (for example, nonparametric regression).

Regression analysis is primarily used for two conceptually distinct purposes. First, regression analysis is widely used for forecasting and forecasting, and its uses overlap considerably with the field of machine learning. Second, in some cases, regression analysis can be used to infer causal relationships between independent and dependent variables. Importantly, regression by itself can only reveal the relationship between the dependent variable and the set of independent variables in a fixed dataset. 

To use regression to predict or infer causal relationships, respectively, researchers must carefully demonstrate why an existing relationship has predictive power for a new setting or why a relationship between two variables has a causal explanation. The latter is especially important when researchers wish to estimate causality using observational data. 

 

Barcelona_Spain_062522A
[Barcelona, Spain]

- Types of Regression Models in ML

Here are the types of Regression algorithms commonly found in the ML field:

  • Decision Tree Regression: The primary purpose of this regression is to divide the dataset into smaller subsets. These subsets are created to plot the value of any data point connecting to the problem statement.
  • Principal Components Regression: This regression technique is widely used. There are many independent variables, or multicollinearity exists in your data.
  • Polynomial Regression: This type fits a non-linear equation by using the polynomial functions of an independent variable.
  • Random Forest Regression: Random Forest regression is heavily used in Machine Learning. It uses multiple decision trees to predict the output. Random data points are chosen from the given dataset and used to build a decision tree via this algorithm.
  • Simple Linear Regression: This type is the least complicated form of regression, where the dependent variable is continuous.
  • Support Vector Regression: This regression type solves both linear and non-linear models. It uses non-linear kernel functions, like polynomials, to find an optimal solution for non-linear models.

 

[More to come ...]


Document Actions