Personal tools

Training Data and Testing Data

University of New South Wales_022724A
[University of New South Wales, Australia]


- Overview

Machine learning (ML) algorithms learn from data in datasets. They discover patterns in data, develop a deeper understanding of the data, make decisions based on the data, and evaluate the accuracy of their choices.

In ML, datasets are typically split into two subsets: training and testing data. The training data is used to train the ML algorithm. The testing data is used to evaluate the accuracy of the trained algorithm.

Understanding the differences between these two data types ensures that ML models are reliable, accurate, and effective.

In ML, training data and testing data are subsets of a dataset: 

  • Training data: A subset of the original data used to train a model. Training data is typically larger than testing data. It can include photos, videos, texts, or audio files. The data is labeled with classes or tags to teach the algorithm how to make predictions.
  • Testing data: A subset of the training data used to test the model's performance. Testing data is typically different from the training data and not labeled. This means the model's output is unknown for each data point. Testing data can be used to assess the progress and efficiency of algorithms' training. It can also be used to modify or optimize algorithms for better results.

The process of training and testing data in ML involves several steps:  

  • Data collection
  • Data preprocessing
  • Data splitting: Train-test split
  • Data augmentation (optional)
  • Model training
  • Model evaluation: testing

By using training and testing data, we can ensure that the ML model can make accurate predictions on new data it has not seen before.

 

[More to come ...]

 

 

Document Actions