Neural Network Basics
Notes from the summary of COMP3010 Lecture 5, labs, and online resources.
Biological Motivation
Neural networks are inspired by the structure and function of biological neurons.
- Dendrites: Receive input signals
- Nucleus: Computes weighted input (like a CPU)
- Axon: Transmits signals
- Axon Terminals: Send signals to other neurons
Simplified computational model:
- Inputs $x_i$ are weighted by $w_i$
- Compute $z = \sum_i w_i x_i$
- Apply activation function: $g(z)$
- Output $g(z)$ is transmitted forward
Biological vs. Artificial Systems:
- The brain operates with massive parallelism and energy efficiency
- Computers are faster in serial computation but less efficient
Neural Network Basics
Historical Context
- 1940s: McCulloch-Pitts neuron
- 1980s–90s: Backpropagation and initial rise
- 2000s: Decline due to compute limits; rise of alternatives (SVM, trees)
- 2010s+: Resurgence due to GPUs, big data, and new architectures (e.g., Transformers)
Neuron Model
A neuron can be modeled as a logistic unit, similar to logistic regression:
- $f_{w,b}(x) = g(w^T x + b)$
- $g(z)$ is the activation function
Examples of activation functions:
- Sigmoid: $g(z) = \frac{1}{1 + e^{-z}}$
- ReLU: $\max(0, z)$
- Tanh: $\tanh(z)$
Neural Network Representation
Architecture
- Composed of layers: Input → Hidden → Output
- $a_i^{(l)}$: Activation of neuron $i$ in layer $l$
- $w_{ij}^{(l)}$: Weight from neuron $i$ in layer $l$ to neuron $j$ in layer $l+1$
Forward Propagation
- $z^{(l)} = a^{(l-1)} W^{(l)} + b^{(l)}$
- $a^{(l)} = g(z^{(l)})$
Network Design
- Binary classification: 1 output (sigmoid)
- Multi-class: $K$ outputs (softmax)
Boolean Function Representation
- Neural nets can represent logic gates like AND, OR, XOR
Universal Approximation Theorem (UAT)
- A single hidden layer can approximate any continuous function
- Deep networks provide hierarchical feature learning and handle complex structures more efficiently
Neural Network Learning
Supervised Learning
- Train on input-output pairs to minimize a loss function
Backpropagation
- Use chain rule to compute gradients
- Update weights using gradient descent: $w := w - \alpha \frac{\partial L}{\partial w}$
Loss Functions
- Binary classification: Binary Cross-Entropy $L = - \frac{1}{m} \sum_{i=1}^{m} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]$
Modern libraries use auto-differentiation to perform backpropagation efficiently.
Summary
- Neural networks mimic the human brain to perform intelligent tasks
- Learning occurs through weight updates based on error gradients
- Deep networks can model hierarchical features and complex functions
Additional Concepts
Activation Functions
- ReLU: $g(z) = \max(0, z)$
- Tanh: $g(z) = \tanh(z)$
- Softmax: Used for multi-class output
Loss Functions
- MSE: For regression
- Cross-Entropy: For classification
Optimization Algorithms
- SGD: Stochastic Gradient Descent
- Adam: Adaptive optimizer with momentum
Regularization
-
Prevent overfitting with:
- L1/L2 penalties
- Dropout
Types of Neural Networks
- CNNs: Convolutional layers for image tasks
- RNNs: Handle sequences (e.g., time-series, text)
- Transformers: Attention-based models for language and beyond
Discussion Topics
Why “Deep Learning”?
- Refers to networks with many layers enabling deep feature extraction
Shallow vs Deep
- Shallow: Simple patterns, limited expressiveness
- Deep: Hierarchical learning of complex, abstract patterns
Layer Roles
- Input Layer: Raw data
- Hidden Layers: Feature transformations
- Output Layer: Final predictions
Importance of Activation Functions
- Provide non-linearity
- Without them, networks behave like simple linear models
Expressive Power
- Neural nets are universal approximators
- More layers/neurons = higher capacity, but higher risk of overfitting
Current Challenges
- Data and compute demands
- Interpretability and bias
- Generalization and robustness
- Vanishing/exploding gradients
- Sustainability and efficiency
Leave a comment