📢 Notice 📢

7 minute read

Feature extraction turns raw pixels into numerical descriptors that a machine can compare, match, and classify.

If feature detection answers “where are the interesting bits?”,
then feature extraction answers “what do they look like?”.

1. From Detection to Description

Feature Detection vs. Feature Extraction

  • Feature detection
    • Goal: Find informative points/regions.
    • Output: Coordinates of keypoints / blobs.
    • Examples:
      • Corners: Harris, Shi–Tomasi
      • Blobs: DoG/LoG, MSER
      • Keypoints: SIFT, SURF, ORB
  • Feature extraction (description)
    • Goal: Describe those points/regions numerically so they can be compared.
    • Output: Feature vectors (descriptors).
    • Examples:
      • SIFT descriptor (128D vector)
      • SURF descriptor (64D)
      • ORB descriptor (binary string)
      • Shape Context (histogram of relative positions)
      • HOG, LBP, texture/shape descriptors

Analogy:

  • Detection = spotting faces in a crowd.
  • Extraction = encoding each face (eyes, nose, jawline) so we can recognise the same person later.

Features can be:

  • Local – tied to specific points/patches (corners, blobs, keypoints).
  • Global – describe the whole object or shape (moments, Fourier descriptors, eigenfaces).

2. Binary Shape Analysis & Connected Components

Binary shape analysis works on images where pixels are either foreground (1) or background (0) to extract geometric and topological properties of shapes.

2.1 Pipeline for Binary Shape Analysis

  1. Thresholding
    • Convert grayscale → binary.
    • Methods:
      • Global threshold (e.g., Otsu’s method).
      • Adaptive / local thresholding.
  2. Morphological operations
    • Clean noise and refine shapes:
      • Dilation – grows foreground regions.
      • Erosion – shrinks them.
      • Opening – erosion then dilation (remove small noise).
      • Closing – dilation then erosion (fill small gaps).
  3. Shape feature extraction
    • Common region properties:
      • Area
      • Perimeter
      • Centroid
      • Bounding box
      • Eccentricity (how elongated)
      • Solidity (area / convex hull area)
      • Orientation
    • These help distinguish, e.g., circles vs. rectangles vs. elongated blobs.

2.2 Connected Component Labeling (CCL)

Goal: Assign a unique label to each “blob” of connected foreground pixels.

  • Connectivity:
    • 4-connectivity: neighbours = up, down, left, right.
    • 8-connectivity: also includes diagonals.

Two-pass algorithm:

  1. First pass
    • Scan image row by row.
    • For each foreground pixel:
      • No labelled neighbours → assign new label.
      • One labelled neighbour → copy that label.
      • Multiple different neighbours → assign one label and record equivalence between labels.
  2. Second pass
    • Resolve label equivalences (using union–find / lookup table).
    • Replace all equivalent labels with a single canonical label.

Output:
A labelled image where each connected region has a distinct integer ID.
From there we can compute per-blob stats like area, bounding box, centroid, etc.

In OpenCV, cv2.connectedComponentsWithStats() directly returns:

  • Number of components
  • Label image
  • Stats (area, bounding box, etc.)
  • Centroids

3. Shape Descriptors

Once blobs are labelled, we want to describe their shape in a way that’s robust and (ideally) invariant to translation, rotation, and scale.

3.1 Shape Context

Idea: For each point on a shape’s contour, describe how all other points are distributed around it.

  • For a reference point:
    • Draw a log-polar grid around it (bins in radius and angle).
    • Count how many contour points fall into each bin.
  • The result is a histogram = the shape context descriptor for that point.

Properties:

  • Captures global spatial distribution of the shape relative to each point.
  • Naturally handles:
    • Moderate deformations.
    • Partial occlusions (some bins missing still OK).
  • Very effective for:
    • Shape matching.
    • Object recognition based on contour.

3.2 Contour-based vs Region-based Shape Descriptors

From classical shape analysis (e.g., Sonka Chapter 8):

Contour-based (external) descriptors:

  • Use only the boundary.
  • Examples:
    • Chain codes – sequence of directions along the boundary.
    • Polygon / line approximation – fit lines/curves to boundary segments.
    • Fourier descriptors – represent boundary as a complex sequence, transform with Fourier; low frequencies ≈ global shape, high frequencies ≈ fine details.
    • Curvature-based – analyze curvature changes along the contour.

Region-based (internal) descriptors:

  • Use all pixels inside the region.
  • Examples:
    • Simple region properties – area, perimeter, compactness, elongatedness, rectangularity.
    • Moments – central moments and Hu’s invariant moments (translation, rotation, scale invariant).
    • Convex hull – smallest convex polygon containing the shape.
    • Skeleton / medial axis – thin representation capturing topology (branches, endpoints).

Design challenges:

  • Invariance – want descriptors robust to translation, rotation, scaling, sometimes projective changes.
  • Scale – description shouldn’t change drastically with resolution.
  • Occlusion – partial shapes should still be recognised as the same class.

4. Texture & Gradient-based Descriptors

Beyond shape, many objects are characterised by texture or local gradient patterns. Two popular hand-crafted descriptors:

4.1 Histogram of Oriented Gradients (HOG)

Concept:

  • Divide image (or detection window) into cells (e.g., $8 \times 8$ pixels).
  • For each pixel:
    • Compute gradient magnitude and orientation.
  • For each cell:
    • Build a histogram of gradient orientations, weighted by magnitude.
  • Group neighbouring cells into blocks and normalize the histograms within each block (helps with lighting/contrast changes).

Descriptor:
Concatenate all normalized cell histograms → one long feature vector.

Strengths:

  • Very good at capturing shape and silhouette.
  • Robust to:
    • Moderate illumination changes (thanks to block normalization).
    • Background clutter.
  • Classic for pedestrian detection, vehicle detection, etc.

Limitations:

  • Not inherently rotation-invariant:
    • Rotating the object changes the orientation distribution.
  • Still relatively high-dimensional and hand-crafted.

4.2 Local Binary Patterns (LBP)

Concept:

  • For each pixel:
    • Compare neighbouring pixels to the center pixel.
    • If neighbour ≥ center → write 1, else 0.
  • This produces an 8-bit binary pattern (for 3×3 neighbourhood) which can be treated as a number (0–255).

Descriptor:

  • For a region/cell, build a histogram of LBP codes.
  • This histogram captures the local texture pattern (e.g., flat, edges, spots).

Strengths:

  • Extremely simple and fast.
  • Good for texture classification:
    • Surface type recognition.
    • Face / skin texture analysis.
    • Segmentation by texture.

Limitations:

  • Sensitive to noise (especially in near-uniform regions).
  • Not inherently scale-invariant (neighbourhood size matters).

5. Keypoint Descriptors: SIFT & SURF

Both SIFT and SURF combine keypoint detection and descriptor extraction, aiming for robust matching across changes in scale, rotation, and illumination.

5.1 SIFT (Scale-Invariant Feature Transform)

Motivation: Traditional detectors (e.g., Harris) are not robust to scaling and significant viewpoint changes.

Detector:

  • Build a scale-space: progressively blur image with Gaussians and subtract neighbouring scales (DoG = Difference of Gaussians).
  • Find local extrema in this DoG scale-space → candidate keypoints.
  • Refine keypoints (reject low-contrast and edge-like responses).

Descriptor:

  • For each keypoint:
    • Align to a dominant orientation (rotation invariance).
    • Take a neighbourhood around the keypoint.
    • Divide into small patches (e.g., $4 \times 4$).
    • For each patch, compute a histogram of gradient orientations.
  • Concatenate histograms → 128-dimensional vector.

Properties:

  • Invariant (or robust) to:
    • Scale changes.
    • Rotation.
    • Moderate affine/viewpoint changes.
    • Illumination changes and noise.
  • Very strong matching performance across different views of the same scene.

Drawbacks:

  • Computationally expensive.
  • Historically patented (limited commercial use in early days).

5.2 SURF (Speeded-Up Robust Features)

SURF is inspired by SIFT but designed to be faster.

Key ideas:

  • Uses integral images to compute box filters very quickly.
  • Keypoint detection based on Hessian matrix determinant approximated with box filters.
  • Descriptor uses Haar wavelet responses (sums of gradients) in subregions; fewer dimensions than SIFT.

Advantages:

  • Much faster than SIFT in practice.
  • Still robust to scale and rotation.
  • Often good enough for many applications.

Trade-offs:

  • Slightly less discriminative than SIFT in some challenging scenarios.
  • Also had patent/licensing considerations historically.

6. Other Global Feature Extraction Approaches

6.1 PCA and Eigenfaces

  • Treat images (e.g., faces) as high-dimensional vectors.
  • Use Principal Component Analysis (PCA) to:
    • Find directions of greatest variance (eigenvectors).
    • Project images into a low-dimensional subspace.
  • In face recognition, these eigenvectors are called eigenfaces.

Pros:

  • Dimension reduction.
  • Captures major appearance variations.

Cons:

  • Requires aligned and normalized images.
  • Linear model; not robust to large pose/lighting changes.

6.2 Bag of Visual Words (BoVW)

  • Detect local features (e.g., SIFT, SURF) across a dataset.
  • Cluster all descriptors into “visual words” (e.g., k-means).
  • Represent each image by a histogram of visual words (word counts).

Uses:

  • Converts variable-length sets of local descriptors into a fixed-length global feature.
  • Widely used for image classification before deep learning.

6.3 Frequency & Multi-scale Transforms

  • Fourier transform – analyses global frequency content.
  • Wavelet transforms – multi-scale, multi-orientation analysis.
  • Useful for:
    • Texture analysis.
    • Face and fingerprint recognition.
    • Defect detection on surfaces.

7. Practical Notes & Key Takeaways

  • Choose features by task:
    • Shape / silhouette → HOG, contour/region shape descriptors.
    • Texture / surface → LBP, GLCM, wavelets.
    • Image matching → SIFT, SURF, ORB.
    • Global appearance → PCA/eigenfaces, BoVW.
  • Think about invariances:
    • Need rotation invariance? Scale invariance? Affine invariance?
    • Choose or design descriptors accordingly.
  • Balance robustness vs. cost:
    • More powerful descriptors (SIFT) are often slower.
    • Lightweight descriptors (LBP, simple moments) may be enough in controlled settings.
  • Pipeline view:
    1. Detect features (points, blobs, regions).
    2. Extract descriptors (local or global).
    3. Use them for matching, classification, clustering, or tracking.

Feature extraction is the bridge between raw pixels and high-level understanding. Good descriptors preserve the information that matters for the task while throwing away everything else.

Leave a comment