📢 Notice 📢

Please have a read first!

Neural Network Basics

March 24, 2025 2 minute read

Notes from the summary of COMP3010 Lecture 5, labs, and online resources.

Biological Motivation

Neural networks are inspired by the structure and function of biological neurons.

Dendrites: Receive input signals
Nucleus: Computes weighted input (like a CPU)
Axon: Transmits signals
Axon Terminals: Send signals to other neurons

Simplified computational model:

Inputs $x_i$ are weighted by $w_i$
Compute $z = \sum_i w_i x_i$
Apply activation function: $g(z)$
Output $g(z)$ is transmitted forward

Biological vs. Artificial Systems:

The brain operates with massive parallelism and energy efficiency
Computers are faster in serial computation but less efficient

Neural Network Basics

Historical Context

1940s: McCulloch-Pitts neuron
1980s–90s: Backpropagation and initial rise
2000s: Decline due to compute limits; rise of alternatives (SVM, trees)
2010s+: Resurgence due to GPUs, big data, and new architectures (e.g., Transformers)

Neuron Model

A neuron can be modeled as a logistic unit, similar to logistic regression:

$f_{w,b}(x) = g(w^T x + b)$
$g(z)$ is the activation function

Examples of activation functions:

Sigmoid: $g(z) = \frac{1}{1 + e^{-z}}$
ReLU: $\max(0, z)$
Tanh: $\tanh(z)$

Neural Network Representation

Architecture

Composed of layers: Input → Hidden → Output
$a_i^{(l)}$: Activation of neuron $i$ in layer $l$
$w_{ij}^{(l)}$: Weight from neuron $i$ in layer $l$ to neuron $j$ in layer $l+1$

Forward Propagation

$z^{(l)} = a^{(l-1)} W^{(l)} + b^{(l)}$
$a^{(l)} = g(z^{(l)})$

Network Design

Binary classification: 1 output (sigmoid)
Multi-class: $K$ outputs (softmax)

Boolean Function Representation

Neural nets can represent logic gates like AND, OR, XOR

Universal Approximation Theorem (UAT)

A single hidden layer can approximate any continuous function
Deep networks provide hierarchical feature learning and handle complex structures more efficiently

Neural Network Learning

Supervised Learning

Train on input-output pairs to minimize a loss function

Backpropagation

Use chain rule to compute gradients
Update weights using gradient descent: $w := w - \alpha \frac{\partial L}{\partial w}$

Loss Functions

Binary classification: Binary Cross-Entropy $L = - \frac{1}{m} \sum_{i=1}^{m} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]$

Modern libraries use auto-differentiation to perform backpropagation efficiently.

Summary

Neural networks mimic the human brain to perform intelligent tasks
Learning occurs through weight updates based on error gradients
Deep networks can model hierarchical features and complex functions

Additional Concepts

Activation Functions

ReLU: $g(z) = \max(0, z)$
Tanh: $g(z) = \tanh(z)$
Softmax: Used for multi-class output

Loss Functions

MSE: For regression
Cross-Entropy: For classification

Optimization Algorithms

SGD: Stochastic Gradient Descent
Adam: Adaptive optimizer with momentum

Regularization

Prevent overfitting with:
- L1/L2 penalties
- Dropout

Types of Neural Networks

CNNs: Convolutional layers for image tasks
RNNs: Handle sequences (e.g., time-series, text)
Transformers: Attention-based models for language and beyond

Discussion Topics

Why “Deep Learning”?

Refers to networks with many layers enabling deep feature extraction

Shallow vs Deep

Shallow: Simple patterns, limited expressiveness
Deep: Hierarchical learning of complex, abstract patterns

Layer Roles

Input Layer: Raw data
Hidden Layers: Feature transformations
Output Layer: Final predictions

Importance of Activation Functions

Provide non-linearity
Without them, networks behave like simple linear models

Expressive Power

Neural nets are universal approximators
More layers/neurons = higher capacity, but higher risk of overfitting

Current Challenges

Data and compute demands
Interpretability and bias
Generalization and robustness
Vanishing/exploding gradients
Sustainability and efficiency

Share on

X Facebook LinkedIn Bluesky

Neural Network Basics

Biological Motivation

Neural Network Basics

Historical Context

Neuron Model

Neural Network Representation

Architecture

Forward Propagation

Network Design

Boolean Function Representation

Universal Approximation Theorem (UAT)

Neural Network Learning

Supervised Learning

Backpropagation

Loss Functions

Summary

Additional Concepts

Activation Functions

Loss Functions

Optimization Algorithms

Regularization

Types of Neural Networks

Discussion Topics

Why “Deep Learning”?

Shallow vs Deep

Layer Roles

Importance of Activation Functions

Expressive Power

Current Challenges

Share on

Leave a comment

You may also enjoy

Performance in HPC: Gnuplot & ARM map

Fundamental Differentiation Rules

Looking Back on My First Internship

Note on Attention, Transformers, and the New Linear Models