📢 Notice 📢

3 minute read

Notes from the summary of COMP3010 Lecture 7, labs, and online resources.

Convolutional Neural Networks (CNN) Overview

MLP vs. CNN

  • MLP (Fully Connected Layers): Best suited for tabular data. Each input feature is treated independently, and spatial structure is lost.
  • CNN: Designed for image data with spatial hierarchies. Preserves spatial relationships between pixels.

Fully Connected Layers (MLP)

  • A 32×32×3 RGB image is flattened to 3072×1 to be input into a fully connected layer.
  • Every pixel is treated as an independent value, connected to every neuron.
  • Issue: Flattening discards spatial information—neighboring pixels lose context.

Convolutional Layers

  • Apply filters (e.g., 5×5×3) that slide over the image.
  • Preserve spatial relationships and detect local patterns like edges, textures, and shapes.
  • Filters span the full depth of the input (e.g., 3 channels for RGB).
  • Fewer parameters due to weight sharing across the image—same filters scan different regions.

Padding

  • Zero-padding is often added to input borders to maintain spatial dimensions.
  • Prevents the feature map from shrinking too quickly and helps retain edge information.
  • Important for deeper layers to process meaningful spatial data.

Parameter Efficiency in CNNs

  • Parameters reside in the filters (e.g., 5×5×3 = 75 weights per filter).
  • Example: 10 such filters → 750 weights + 10 biases = 760 parameters.
  • Compared to MLPs: 3072 inputs × 1024 neurons = ~3 million weights.
  • CNNs reduce parameters by reusing the same filters across the entire image.

Pooling Layers

  • No parameters (no weights or biases).
  • Perform operations like max pooling or average pooling to reduce spatial dimensions.
  • Benefits:
    • Reduce computational cost
    • Introduce translation invariance
    • Summarize local features

CNN Architectures

  • Convolutional Layer: Preserves spatial layout via local connections and shared filters. Introduces receptive fields and captures low- to high-level patterns.
  • Pooling Layer: Reduces dimensionality and retains key features.
  • Fully Connected Layer: Final layer that performs classification or regression.

Notable Architectures

  • AlexNet: First CNN to win ImageNet; proved CNNs are viable for vision tasks.
  • VGGNet: Used deeper networks with small (3×3) filters.
  • GoogLeNet: Introduced inception modules; removed fully connected layers.
  • ResNet: Added residual connections; allowed very deep networks.
  • MobileNet, ShuffleNet: Lightweight models optimized for speed and efficiency.
  • Neural Architecture Search: Automates CNN architecture design.

Transfer Learning

  • Use a CNN trained on a large dataset (e.g., ImageNet) to fine-tune for a smaller custom dataset.
  • Techniques:
    • Freeze lower layers (if new data is similar)
    • Fine-tune entire model (if new data is different)
    • Use pre-trained CNN as a feature extractor

Available through model zoos in PyTorch, TensorFlow, etc.

Key Concepts

Comparing Convolutional vs. Fully Connected Layers

Feature Convolutional Layers Fully-Connected Layers
Connectivity Local (receptive field) Global (all-to-all)
Parameters Shared across spatial locations Unique weights for every input-neuron pair
Feature Learning Spatial hierarchies (edges, patterns) Global relationships
Input Format Multi-dimensional arrays (images) Flattened 1D vectors

Translation Equivariance: CNNs preserve spatial structure. If an object moves, the activation shifts accordingly.

Role of Pooling Layers

  • Dimensionality Reduction: Downsample feature maps.
  • Translation Invariance: Makes features robust to small shifts.
  • Feature Summarization: Capture dominant features in local regions.

Common types:

  • Max Pooling: Retains maximum value
  • Average Pooling: Computes average in region

Receptive Field

  • The area of the input that influences a particular neuron’s activation.
  • Deeper layers see progressively larger input regions.

Effective Receptive Field for Three 3×3 Conv Layers (stride = 1):

  • 1st layer: 3×3
  • 2nd layer: 5×5 (3 + 2)
  • 3rd layer: 7×7 (5 + 2)

Parameter Count for 3 Layers (assuming constant channels $C$):

  • Each: $3×3×C×C + C = 9C^2 + C$
  • Total: $3×(9C^2 + C) = 27C^2 + 3C$

CNN Troubleshooting & Strategies

Low Accuracy (Train + Test)

  • Check dataset: Cleanliness, label accuracy, class balance
  • Model architecture: May be too simple or too complex
  • Hyperparameters: Adjust learning rate, batch size, epochs
  • Preprocessing: Normalize inputs; verify augmentations
  • Debugging: Check for implementation bugs

High Training, Low Test Accuracy

  • Overfitting mitigation:
    • Data Augmentation
    • Dropout
    • L1/L2 Regularization
    • Batch Normalization
    • Early Stopping
  • Improving Generalization:
    • Collect more data
    • Try different architectures
    • Use Transfer Learning
    • Optimize hyperparameters for validation performance

Leave a comment