Personal tools

Target Variables and Independent Variables

Princeton University_081921A
[Princeton University]



- Overview

Independent and dependent variables are important concepts in machine learning (ML), as they represent the target variable and input variables used to make predictions or explain the variation in the target variable. 

To build trustworthy and accurate prediction models, it is essential to first comprehend the interplay between these factors and then choose the most useful independent variables.

A target variable is a variable or metric that a supervised ML model attempts to predict. It's also known as the dependent variable, the response variable, the "y" variable, or the model output.  

  • In a decision tree analysis, there must be one and only one target variable. 
  • In ML, a target variable is the variable that is or should be the output. For example, it could be binary 0 or 1 if you are classifying or it could be a continuous variable if you are doing a regression. 
  • In statistics, a target variable is also referred to as the response variable. 
  • In economics, a target variable may be a price, a price index, a rate of change of prices, a quantity index, or a rate of change of quantities. 
 

The target variable is the variable whose values are modeled and predicted by other variables. A predictor variable is a variable whose values will be used to predict the value of the target variable.

 

- Target Variables (or Independent Variables)

In supervised learning algorithms, the target variable is the dependent variable. The dependent variable is the variable that is being predicted or explained. 

Supervised learning algorithms use features, or independent variables, to predict a target. The dependent variable can be a discrete target variable (classification) or a continuous target variable (regression).

The dependent variable must be known and numeric. This means that the data must be labeled and categorized. The labeled data is used to train the model, which then uses it to make predictions on new, unlabeled data. The model's accuracy is evaluated based on how well it can classify or predict the dependent variable on the new data. 

The dependent and independent variables are also known as response and explanatory variables, respectively. The dependent variable is the one being trained on, whereas the independent variables are those being used to train the model. 

How it works: This algorithm consists of a target/outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables). Using this set of variables, we generate a function that maps input data to desired outputs. 

 

- Independent Variables vs. Dependent Variables

The dependent variable is the variable that is sought to predict or explain. Supervised learning involves making predictions about a target variable based on the values ​​of a set of input variables. 

In a regression problem, the dependent variable might be the cost of the house, and the independent variables might be the number of bedrooms, lot size, neighborhood, etc. 

In contrast, independent variables (sometimes called predictor variables) are variables used to generate predictions about changes in the dependent variable (target) or to explain changes in the dependent variable (target). 

Variables can be quantitative or qualitative, and can have a continuous or categorical structure. Data may be altered or scaled to increase its predictive value.

- The Relationship between the Dependent and Independent Variables

The relationship between the dependent and independent variables is often modeled using statistical or ML techniques. Methods that try to capture the connections between the variables include linear regression, logistic regression, decision trees, and others. 

As the accuracy and dependability of an ML model’s predictions are highly reliant on the quality and relevance of the independent variables included in the model, it is crucial to choose these variables with care. 

Domain expertise may guide the choice of independent variables; alternatively, feature selection or feature engineering methods can be employed to zero in on the most informative variables.

 

 

[More to come ...]


Document Actions