Implementing ANNs Training Process
- (Harvard University - Harvard Taiwan Student Association)
- Overview
Neural networks are generic models that can solve problems without being programmed with specific rules and conditions. They are inspired by biological neural networks and are part of supervised machine learning. The goal of artificial neural networks (ANNs) is to map input to output. They can be used to solve both regression and classification problems.
Neural networks typically have different layers, including:
- Input layer: Picks up input signals and passes them to the next layer
- Hidden layer: Performs calculations and feature extractions
- Output layer: Delivers the final result
Some examples of neural networks include: Feedforward neural networks, Backpropagation algorithm, Convolutional neural networks, Recurrent neural networks, and Multilayer perceptron.
- Steps for Building a Neural Network
Artificial neural networks (ANNs), like humans, learn by example. Through a learning process, ANNs are configured for specific applications, such as pattern recognition or data classification. Learning primarily involves adjustments to the synaptic connections between neurons.
The brain is made up of hundreds of billions of cells called neurons. These neurons are connected together by synapses, which are connections through which neurons send impulses to another neuron.
When one neuron sends an excitation signal to another neuron, that signal is added to all the other inputs to that neuron. If that signal exceeds a given threshold, it causes the target neuron to fire an action signal forward - this is how the inner workings of the thought process work.
In computer science, we model this process by creating "networks" on computers using matrices. These networks can be understood as an abstraction of neurons without all the biological complexity.
Here are some steps for building a neural network:
- Create an approximation model
- Configure data set
- Set network architecture
- Train neural network
- Improve generalization performance
- Test results
- Deploy model
- To Train an ANN
To train an Artificial Neural Network (ANN), you can use a step-by-step approach, starting with defining the network architecture, loading and preprocessing the data, and then training the model using forward and backward propagation.
A common example is image classification using the MNIST dataset, where the goal is to build a neural network that can accurately classify handwritten digits (0-9).
Here's a breakdown of the process:
1. Define the ANN Architecture:
- Input Layer: Determine the number of input nodes based on the dimensions of your data. For MNIST, each image is 28x28 pixels, so you'd likely use 784 input nodes (28 * 28).
- Hidden Layer(s): Decide on the number of hidden layers and neurons in each layer. A common approach is to start with a small number of hidden layers (e.g., 1-3) and adjust as needed.
- Output Layer: The number of output nodes depends on the number of classes you're trying to predict. For MNIST, you'd need 10 output nodes, one for each digit (0-9).
- Activation Functions: Choose activation functions for each layer (e.g., Sigmoid, ReLU, etc.). Sigmoid is often used for output layers in binary classification, while ReLU is common for hidden layers.
2. Load and Preprocess the Data:
- Dataset: Obtain your training and testing data (e.g., MNIST dataset).
- Preprocessing: Prepare the data by converting it into a suitable format for the ANN. For MNIST, this might involve scaling pixel values to a range between 0 and 1.
3. Initialize Weights and Biases:
- Random Initialization: Randomly initialize the weights and biases of the connections between neurons in each layer.
4. Forward Propagation:
- Input: Feed the preprocessed input data into the network.
- Calculations: Calculate the weighted sum of inputs for each neuron in the hidden layers, apply the activation function, and pass the result to the next layer.
- Output: Obtain the predicted output from the output layer.
5. Backward Propagation:
- Error Calculation: Compare the predicted output with the actual output (target) to calculate the error.
- Weight Adjustment: Adjust the weights and biases using the calculated error and a learning rate. This is done using a process called backpropagation.
6. Training:
- Iterate: Repeat steps 4 and 5 multiple times (epochs) until the model achieves satisfactory accuracy.
7. Evaluation:
- Test Data: Evaluate the trained model on the test data to assess its performance.
- Metrics: Use metrics like accuracy, precision, recall, and F1-score to evaluate the model's performance.
- Step-by-Step Manual Calculation of an ANN
Consider a basic ANN with:
- 1 input layer: 2 input features (x1, x2)
- 1 hidden layer: 2 neurons (h1, h2)
- 1 output layer: 1 output neuron (o1)
The input data is processed through the following steps:
1. Input Features:
x1 = 0.5, x2 = 0.8
These are the input features fed into the model.
- Weights for connections from the input to the hidden layer:
w11 = 0.1, w12 = 0.3, w21 = 0.2, w22 = 0.4
Weights for connections from the hidden layer to the output:
Biases for hidden layer and output
3. Step 1: Hidden Layer Computations: The hidden layer neurons apply weights and biases, and an activation function like the sigmoid is used.
- For hidden neuron h1:
Now, apply the sigmoid activation function:
For hidden neuron h2:
Apply the sigmoid activation function:
4. Step 2: Output Layer Computation: The output neuron takes inputs from the hidden neurons and applies its own weights and biases, followed by the activation function.
For the output neuron o1:
Apply the sigmoid activation function:
Thus, the output of this ANN is approximately 0.726.
- Python Implementation of the ANN Example
Now, the same steps will be implemented in Python for better understanding.
import numpy as np
# Sigmoid activation function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Input features
x1, x2 = 0.5, 0.8
# Weights for input to hidden layer
w11, w12 = 0.1, 0.3
w21, w22 = 0.2, 0.4
# Weights for hidden to output layer
w_o1, w_o2 = 0.6, 0.5
# Biases
b_h1, b_h2 = 0.1, 0.2 # Biases for hidden layer neurons
b_o = 0.3 # Bias for output layer neuron
# Step 1: Hidden layer computations
z_h1 = (x1 * w11) + (x2 * w21) + b_h1
z_h2 = (x1 * w12) + (x2 * w22) + b_h2
# Applying the sigmoid activation function
h1 = sigmoid(z_h1)
h2 = sigmoid(z_h2)
# Step 2: Output layer computation
z_o1 = (h1 * w_o1) + (h2 * w_o2) + b_o
# Applying the sigmoid activation function
o1 = sigmoid(z_o1)
# Output the final result
print(f"Output of the ANN: {o1}")
- Explanation of Python Code
- Sigmoid function: It is used as the activation function to map any real-valued number into the range of (0, 1).
- Weights and biases: Initialized for both the input to hidden layer and hidden to output layer.
- Step 1: The hidden layer’s neurons calculate their weighted inputs and apply the activation function.
- Step 2: The output neuron receives the outputs of the hidden layer neurons, applies weights and biases, and finally computes the output using the activation function.