📢 Notice 📢

Please have a read first!

Linear Classification

March 10, 2025 3 minute read

Notes from the summary of COMP3010 Lecture 3, labs, and online resources.

Introduction to Classification

Classification is a core task in machine learning where the goal is to assign input data to predefined categories or classes. Unlike regression, which predicts continuous values, classification outputs discrete labels.

Common examples:

Email filtering (spam vs. non-spam)
Disease diagnosis (positive vs. negative)
Image recognition (e.g., cat vs. dog)

Logistic Regression vs. Linear Regression

Feature	Linear Regression	Logistic Regression
Output Type	Continuous values	Probabilities for classification
Model	Fits a straight line	Applies sigmoid function
Loss Function	Mean Squared Error (MSE)	Log Loss (Cross-Entropy)
Optimization	Least squares / Gradient Descent	Gradient Descent
Usage	Predicting numerical outputs	Binary or Multi-class Classification

Despite its name, logistic regression is a classification method that combines a linear model with a sigmoid function to output values between 0 and 1. A threshold (usually 0.5) is applied to map probabilities to class labels.

Training Logistic Regression

The sigmoid function used in logistic regression is:

$y = \frac{1}{1 + e^{-(wx + b)}}$

Why Not MSE?

Using MSE leads to non-convex loss, which complicates convergence.
Instead, Binary Cross-Entropy (Log Loss) is used:

$L = - \left[ y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) \right]$

This loss function is convex and better suited for optimization using gradient descent.

Learning Rate and Convergence

A proper learning rate is crucial:
- Too high → overshooting, oscillation.
- Too low → slow or stuck convergence.
Adaptive methods (e.g., Adam, RMSProp) help adjust learning rate dynamically.

Support Vector Machines (SVM)

SVM is a linear classification technique that finds the optimal hyperplane separating classes by maximizing the margin between support vectors.

SVM Intuition:

Support Vectors: Closest data points to the decision boundary.
Margin: Distance between support vectors and the decision boundary.
Focuses on separation rather than probabilities.

The Kernel Trick

For non-linearly separable data, SVM uses the kernel trick to map data into a higher-dimensional space where a linear boundary becomes feasible.

Implicit Mapping: Transforms input space without explicitly computing new coordinates.
Kernel Functions:
- Polynomial
- Gaussian (RBF)
- Sigmoid
Benefits:
- Allows complex decision boundaries
- Computationally efficient
- Adaptable to different data patterns via kernel choice

SVM vs Deep Learning

Aspect	SVM with Kernel	Deep Learning
Feature Engineering	Handcrafted (kernel selection)	Learned from data
Flexibility	Moderate, depends on kernel	Highly flexible, end-to-end
Data Requirement	Lower	Requires large labeled datasets
Computation	Efficient with good kernel	Computationally expensive

Effect of Outliers

Logistic Regression:

Sensitive to outliers: Outliers can distort the decision boundary.
Why? LR tries to maximize likelihood, and extreme data points shift the boundary.

Mitigation:

Regularization: L1/L2 penalizes large weights.
Preprocessing: Detect and remove outliers.

SVM:

More robust: Focuses on maximizing margin, not fitting all data.
Soft Margin SVM: Introduces tolerance for misclassifications with C parameter:
- Lower C → more flexibility, larger margin
- Higher C → stricter separation, less tolerance

Multi-Class Classification

Both logistic regression and SVM are inherently binary classifiers. For multi-class problems, they can be extended using:

One-vs-All (OvA) / One-vs-Rest (OvR)

Train one classifier per class (positive vs. rest).
Prediction: Class with highest probability.
Pros: Simple and efficient.
Cons: Sensitive to imbalanced datasets.

One-vs-One (OvO)

Train one classifier per pair of classes.
Total classifiers: $K(K-1)/2$
Prediction: Majority voting.
Pros: Smaller training sets per classifier.
Cons: High computational cost for many classes.

Softmax Regression (Multinomial Logistic Regression)

Extends logistic regression directly to handle multiple classes.
Outputs a probability distribution over all classes.

SVM Multi-Class Extensions

Also supports OvA and OvO strategies.

Summary of Key Concepts

Concept	Definition
Classification	Assigning discrete labels to input data
Logistic Regression	Linear model with sigmoid for binary classification
Cross-Entropy Loss	Convex loss for classification tasks
Gradient Descent	Optimization to minimize the loss
Learning Rate	Controls update step size in optimization
Decision Boundary	Line or hyperplane separating classes
Support Vectors	Points closest to the decision boundary in SVM
Kernel Trick	Implicitly maps data to higher dimensions
Regularization	Penalizes complexity to prevent overfitting
Soft Margin SVM	Allows misclassification with penalty
OvA / OvO	Multi-class extension strategies for binary classifiers
Softmax Regression	Multinomial logistic regression for multi-class tasks

Share on

X Facebook LinkedIn Bluesky