📢 Notice 📢

3 minute read

Notes from the summary of COMP3010 Lecture 3, labs, and online resources.

Introduction to Classification

Classification is a core task in machine learning where the goal is to assign input data to predefined categories or classes. Unlike regression, which predicts continuous values, classification outputs discrete labels.

Common examples:

  • Email filtering (spam vs. non-spam)
  • Disease diagnosis (positive vs. negative)
  • Image recognition (e.g., cat vs. dog)

Logistic Regression vs. Linear Regression

Feature Linear Regression Logistic Regression
Output Type Continuous values Probabilities for classification
Model Fits a straight line Applies sigmoid function
Loss Function Mean Squared Error (MSE) Log Loss (Cross-Entropy)
Optimization Least squares / Gradient Descent Gradient Descent
Usage Predicting numerical outputs Binary or Multi-class Classification

Despite its name, logistic regression is a classification method that combines a linear model with a sigmoid function to output values between 0 and 1. A threshold (usually 0.5) is applied to map probabilities to class labels.

Training Logistic Regression

The sigmoid function used in logistic regression is:

$y = \frac{1}{1 + e^{-(wx + b)}}$

Why Not MSE?

  • Using MSE leads to non-convex loss, which complicates convergence.
  • Instead, Binary Cross-Entropy (Log Loss) is used:

$L = - \left[ y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) \right]$

This loss function is convex and better suited for optimization using gradient descent.

Learning Rate and Convergence

  • A proper learning rate is crucial:
    • Too high → overshooting, oscillation.
    • Too low → slow or stuck convergence.
  • Adaptive methods (e.g., Adam, RMSProp) help adjust learning rate dynamically.

Support Vector Machines (SVM)

SVM is a linear classification technique that finds the optimal hyperplane separating classes by maximizing the margin between support vectors.

SVM Intuition:

  • Support Vectors: Closest data points to the decision boundary.
  • Margin: Distance between support vectors and the decision boundary.
  • Focuses on separation rather than probabilities.

The Kernel Trick

For non-linearly separable data, SVM uses the kernel trick to map data into a higher-dimensional space where a linear boundary becomes feasible.

  • Implicit Mapping: Transforms input space without explicitly computing new coordinates.
  • Kernel Functions:
    • Polynomial
    • Gaussian (RBF)
    • Sigmoid
  • Benefits:
    • Allows complex decision boundaries
    • Computationally efficient
    • Adaptable to different data patterns via kernel choice

SVM vs Deep Learning

Aspect SVM with Kernel Deep Learning
Feature Engineering Handcrafted (kernel selection) Learned from data
Flexibility Moderate, depends on kernel Highly flexible, end-to-end
Data Requirement Lower Requires large labeled datasets
Computation Efficient with good kernel Computationally expensive

Effect of Outliers

Logistic Regression:

  • Sensitive to outliers: Outliers can distort the decision boundary.
  • Why? LR tries to maximize likelihood, and extreme data points shift the boundary.

Mitigation:

  • Regularization: L1/L2 penalizes large weights.
  • Preprocessing: Detect and remove outliers.

SVM:

  • More robust: Focuses on maximizing margin, not fitting all data.
  • Soft Margin SVM: Introduces tolerance for misclassifications with C parameter:
    • Lower C → more flexibility, larger margin
    • Higher C → stricter separation, less tolerance

Multi-Class Classification

Both logistic regression and SVM are inherently binary classifiers. For multi-class problems, they can be extended using:

One-vs-All (OvA) / One-vs-Rest (OvR)

  • Train one classifier per class (positive vs. rest).
  • Prediction: Class with highest probability.
  • Pros: Simple and efficient.
  • Cons: Sensitive to imbalanced datasets.

One-vs-One (OvO)

  • Train one classifier per pair of classes.
  • Total classifiers: $K(K-1)/2$
  • Prediction: Majority voting.
  • Pros: Smaller training sets per classifier.
  • Cons: High computational cost for many classes.

Softmax Regression (Multinomial Logistic Regression)

  • Extends logistic regression directly to handle multiple classes.
  • Outputs a probability distribution over all classes.

SVM Multi-Class Extensions

  • Also supports OvA and OvO strategies.

Summary of Key Concepts

Concept Definition
Classification Assigning discrete labels to input data
Logistic Regression Linear model with sigmoid for binary classification
Cross-Entropy Loss Convex loss for classification tasks
Gradient Descent Optimization to minimize the loss
Learning Rate Controls update step size in optimization
Decision Boundary Line or hyperplane separating classes
Support Vectors Points closest to the decision boundary in SVM
Kernel Trick Implicitly maps data to higher dimensions
Regularization Penalizes complexity to prevent overfitting
Soft Margin SVM Allows misclassification with penalty
OvA / OvO Multi-class extension strategies for binary classifiers
Softmax Regression Multinomial logistic regression for multi-class tasks

Leave a comment