Personal tools

Foundation of Pattern Recognition

Harvard (Charles River) IMG 7667
(Harvard University - Harvard Taiwan ROC Student Club)

- Overview

Pattern recognition (PR) is the automatic identification of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics, and machine learning (ML). 

PR has its roots in statistics and engineering; due to the increased availability of big data and the abundance of new processing power, some modern methods of PR include the use of ML. These activities can be viewed as two sides of the same application area and have undergone substantial development over the past few decades. 

PR systems are typically trained from labeled "training" data. When no labeled data is available, other algorithms can be used to discover previously unknown patterns. KDD and data mining are more focused on unsupervised methods and stronger connections to business usage. 


- Pattern Recognition and Classifications

In ML, PR is the assignment of labels to given input values. In statistics, discriminant analysis was introduced in 1936 for the same purpose. An example of PR is classification, which attempts to assign each input value to one of a given set of categories (eg, to determine whether a given email is "spam" or "not spam"). 

PR is a more general problem, and it encompasses other types of output as well. Other examples are regression, which assigns a real-valued output to each input; sequence tokens, which assign a class to each member of a sequence of values ​​(e.g., part-of-speech tokens, which assign a part of the speech of each word in the input sentence); and parsing, which assigns a parse tree to the input sentence, describing the syntactic structure of the sentence. 


- Pattern Recognition Algorithms

PR algorithms generally aim to provide reasonable answers for all possible inputs, and perform a "most likely" match of the inputs taking into account their statistical variation. This is in contrast to pattern-matching algorithms, which look for matches in the input that exactly match pre-existing patterns. 

A common example of a pattern matching algorithm is regular expression matching, which finds patterns of a given type in text data and is included in the search capabilities of many text editors and word processors.  


- Mathematical Patterns

In math, a number pattern is a sequence of numbers that follow a specific rule. Recognizing a mathematical pattern involves observing patterns within given data.

Solving a mathematical pattern involves making predictions or extending the pattern based on the pattern recognized. 

Here are some types of number patterns: 

  • Geometric pattern: A sequence of numbers based on multiplication and division operations.
  • Arithmetic number pattern: A sequence where the same amount is either being added or subtracted every time. For example, in the sequence 9, 18, 27, 36, 45, 54, the common difference is 9.
  • Triangular numbers: A sequence of numbers that can be represented as a triangle of dots. The term to term rule for the triangle numbers is to add one more each time.


- The Process in Pattern Recognition

Pattern recognition (PR) is the process of using a ML algorithm to classify data based on statistical information or knowledge already gained. It's a derivative of ML that uses data analysis to recognize regularities and incoming patterns.

The PR process can be structured as follows:

  • Collect digital data
  • Clean the data from noise
  • Examine information for important features or familiar elements
  • Group the elements into segments
  • Analyze data sets for insights
  • Implement the extracted insights

The PR process typically has four main phases: Preprocessing, Training, Testing, Deployment. These phases involve a series of activities that are designed to develop and evaluate a pattern recognition system. 

Feature extraction is a crucial step in pattern recognition. It involves selecting and representing the most relevant information or attributes from the raw data. 

Template matching involves defining a measure or a cost to find the “similarity” between the (known) reference patterns and the (unknown) test pattern. 



[More to come ...]




Document Actions