📢 Notice 📢

2 minute read

Notes from the summary of COMP3010 Lecture 5, labs, and online resources.

Biological Motivation

Neural networks are inspired by the structure and function of biological neurons.

  • Dendrites: Receive input signals
  • Nucleus: Computes weighted input (like a CPU)
  • Axon: Transmits signals
  • Axon Terminals: Send signals to other neurons

Simplified computational model:

  • Inputs $x_i$ are weighted by $w_i$
  • Compute $z = \sum_i w_i x_i$
  • Apply activation function: $g(z)$
  • Output $g(z)$ is transmitted forward

Biological vs. Artificial Systems:

  • The brain operates with massive parallelism and energy efficiency
  • Computers are faster in serial computation but less efficient

Neural Network Basics

Historical Context

  • 1940s: McCulloch-Pitts neuron
  • 1980s–90s: Backpropagation and initial rise
  • 2000s: Decline due to compute limits; rise of alternatives (SVM, trees)
  • 2010s+: Resurgence due to GPUs, big data, and new architectures (e.g., Transformers)

Neuron Model

A neuron can be modeled as a logistic unit, similar to logistic regression:

  • $f_{w,b}(x) = g(w^T x + b)$
  • $g(z)$ is the activation function

Examples of activation functions:

  • Sigmoid: $g(z) = \frac{1}{1 + e^{-z}}$
  • ReLU: $\max(0, z)$
  • Tanh: $\tanh(z)$

Neural Network Representation

Architecture

  • Composed of layers: Input → Hidden → Output
  • $a_i^{(l)}$: Activation of neuron $i$ in layer $l$
  • $w_{ij}^{(l)}$: Weight from neuron $i$ in layer $l$ to neuron $j$ in layer $l+1$

Forward Propagation

  • $z^{(l)} = a^{(l-1)} W^{(l)} + b^{(l)}$
  • $a^{(l)} = g(z^{(l)})$

Network Design

  • Binary classification: 1 output (sigmoid)
  • Multi-class: $K$ outputs (softmax)

Boolean Function Representation

  • Neural nets can represent logic gates like AND, OR, XOR

Universal Approximation Theorem (UAT)

  • A single hidden layer can approximate any continuous function
  • Deep networks provide hierarchical feature learning and handle complex structures more efficiently

Neural Network Learning

Supervised Learning

  • Train on input-output pairs to minimize a loss function

Backpropagation

  • Use chain rule to compute gradients
  • Update weights using gradient descent: $w := w - \alpha \frac{\partial L}{\partial w}$

Loss Functions

  • Binary classification: Binary Cross-Entropy $L = - \frac{1}{m} \sum_{i=1}^{m} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]$

Modern libraries use auto-differentiation to perform backpropagation efficiently.

Summary

  • Neural networks mimic the human brain to perform intelligent tasks
  • Learning occurs through weight updates based on error gradients
  • Deep networks can model hierarchical features and complex functions

Additional Concepts

Activation Functions

  • ReLU: $g(z) = \max(0, z)$
  • Tanh: $g(z) = \tanh(z)$
  • Softmax: Used for multi-class output

Loss Functions

  • MSE: For regression
  • Cross-Entropy: For classification

Optimization Algorithms

  • SGD: Stochastic Gradient Descent
  • Adam: Adaptive optimizer with momentum

Regularization

  • Prevent overfitting with:

    • L1/L2 penalties
    • Dropout

Types of Neural Networks

  • CNNs: Convolutional layers for image tasks
  • RNNs: Handle sequences (e.g., time-series, text)
  • Transformers: Attention-based models for language and beyond

Discussion Topics

Why “Deep Learning”?

  • Refers to networks with many layers enabling deep feature extraction

Shallow vs Deep

  • Shallow: Simple patterns, limited expressiveness
  • Deep: Hierarchical learning of complex, abstract patterns

Layer Roles

  • Input Layer: Raw data
  • Hidden Layers: Feature transformations
  • Output Layer: Final predictions

Importance of Activation Functions

  • Provide non-linearity
  • Without them, networks behave like simple linear models

Expressive Power

  • Neural nets are universal approximators
  • More layers/neurons = higher capacity, but higher risk of overfitting

Current Challenges

  • Data and compute demands
  • Interpretability and bias
  • Generalization and robustness
  • Vanishing/exploding gradients
  • Sustainability and efficiency

Leave a comment