Sections

Personal tools

You are here: Home › Research Trends & Opportunities › New Media and New Digital Economy › Data Science and Analytics › Pattern Recognition, Training Data and AI › Foundations of Pattern Recognition › Training Data, Labeled Data, Unlabeled Data › Training Data and Test Data › Training Data

EITC

News: 新興資訊科研會台美加交流 Aug 18, 2012; 「第一屆青年研發學者會議」 8月18、19兩日在哈佛大學工程與應用科學學院Maxwell Dworkin Building舉行 Dec 12, 2011; 2009年第9屆新興資訊與科技研討會會議(EITC-2009)紀實 Sep 27, 2009; 第九屆新興資訊科技會議落幕 Aug 15, 2009; More news…

Training Data

: [Old Nassau, Princeton University - Office of Communication]

- Overview

In machine learning (ML), training data is a large dataset used to train a model or algorithm. It's used to teach prediction models how to extract features that are relevant to business goals.

Training data can include: labeled images, text documents, audio recordings, sensor data.

Training data is used in three main types of ML: supervised learning, unsupervised learning, semi-supervised learning.

In supervised learning, the training data must be labeled. This allows the model to learn a mapping from the label to its associated features. The more training data a model has, the better it can make predictions.

Here are some steps for preparing data for ML:

Transform all the data files into a common format
Explore the dataset using a data preparation tool like Tableau, Python Pandas, etc.
Clean the data using mathematical operations
Pick feature variables from the dataset using feature selection methods

Data transformation includes dimensionality reduction, feature selection, and creation of new features. These steps help reduce data noise and improve the ML model's ability to make accurate predictions.

[More to come ...]

Document Actions

Send this