📢 Notice 📢

3 minute read

Feature detection in computer vision, progressing from points and patches to edges and lines.

Introduction

Features are distinctive patterns or structures in images that can be used for tasks like object recognition, scene understanding, and tracking.

Features are distinctive patterns or structures in images used for:

  • Object recognition
  • Scene understanding
  • Tracking

Points and Patches

Use Case: Matching across images (e.g., panorama stitching, change detection).

Good features to track:

  • Unique and distinctive in their local neighborhood.
  • Found via auto-correlation function over small displacements.

Mathematical basis:

  • Auto-correlation matrix (A): $A = \sum W \begin{bmatrix} I_x^2 & I_x I_y \ I_x I_y & I_y^2 \end{bmatrix}$

    where $I_x$ and $I_y$ are image gradients in the $x$ and $y$ directions, respectively.

  • Eigenvalue interpretation:

    • Both large → corner (good feature).
    • One large, one small → edge.
    • Both small → flat region.

Detectors:

  • Harris Corner Detector: $R = \det(A) - k \cdot (\text{trace}(A))^2$, where $k \in [0.04, 0.06]$

  • Shi–Tomasi Detector: $\min(\lambda_1, \lambda_2) > \text{threshold}$

Harris Corner Detector

Formula: $R = \det(A) - k \cdot (\text{trace}(A))^2$ where $A$ is the structure tensor.

Idea: Measures how much intensity changes in all directions.

Corner Criterion: Large positive $R$ indicates a corner.

Pros:

  • Good at detecting corners even under rotation.

Cons:

  • Sensitive to the parameter $k$.
  • May detect edges as corners.

Shi–Tomasi Detector

Formula: Uses eigenvalues $\lambda_1, \lambda_2$ of matrix $A$.

Corner Criterion: $\min(\lambda_1, \lambda_2) > \text{threshold}$

Idea: A point is a good corner if both eigenvalues are large.

Pros:

  • More robust and selective than Harris.
  • Avoids false positives.

Cons:

  • Slightly more computationally expensive due to eigenvalue calculation.

Edge Detection

Definition: Edges represent locations of rapid intensity change in one direction.
Useful for:

  • Segmenting objects
  • Finding boundaries
  • Identifying shapes

Key Methods

  • Harris Detector: Extends from corner detection to detect points where one gradient direction is strong and the other is weak.
  • Gradient-Based Methods (Sobel, Prewitt, Difference of Gaussian, Canny):
    1. Compute gradients in X and Y directions.
    2. Calculate gradient magnitude:
      $\sqrt{G_x^2 + G_y^2}$
    3. Apply thresholding to identify edges.
  • Preprocessing: Noise removal and smoothing.
  • Postprocessing: Non-maximum suppression to thin edges and remove weak/unconnected edges.
  • Sobel: Uses fixed convolution kernels for X and Y directions to emphasize edges.
  • Prewitt: Similar to Sobel but with slightly different kernel values.
  • Canny Edge Detector:
    1. Gaussian smoothing.
    2. Gradient computation.
    3. Non-maximum suppression.
    4. Double thresholding.
    5. Edge tracking by hysteresis.

Canny often produces more accurate and meaningful edges than Sobel or Prewitt.

Line Detection

Lines are special cases of edges, representing straight boundaries.

Applications

  • Shape detection (buildings, rectangles, checkerboards for calibration).
  • Prior knowledge of line presence aids in recognition.

Hough Transform

A classical method to detect lines:

  1. Perform edge detection to identify candidate points.
  2. Represent lines in polar coordinates (ρ, θ):
    • ρ = x*cosθ + y*sinθ
  3. Use an accumulator array to count how many edge points align for each (ρ, θ) pair.
  4. Peaks in the accumulator correspond to strong candidate lines.
  5. Work backwards to draw detected lines.

The Hough Transform is efficient for finding straight lines in binary edge maps.

Region / Blob Detection

Definition:
Blobs are regions with homogeneous properties (e.g., color, brightness, texture).

Uses:

  • Identifying object parts.
  • Regions of Interest (ROI) for recognition and tracking.
  • Shape and size analysis.

Classical Methods

Laplacian of Gaussian (LoG)

  • Detects blobs by finding zero-crossings of the LoG filter response.
  • Optimal scale:
    $\sigma \approx 0.707 \times \text{(blob size)}$

    This scale maximizes the filter’s response.

  • Difference of Gaussians (DoG):
    • A computationally efficient approximation of LoG.
    • Obtained by subtracting two Gaussian-blurred images with different $\sigma$ values.

Maximally Stable Extremal Regions (MSER)

  • Vary the intensity threshold from dark to bright.
  • Select regions that remain stable over a wide range of thresholds.
  • Properties:
    • Invariant to affine transformations.
    • Robust to monotonic intensity changes.
  • Applications:
    • Text detection.
    • Road sign and number plate recognition.

Post-processing

To improve detection quality:

  • Reject regions that are too large or too small.
  • Discard unstable regions.
  • Remove redundant overlapping regions.

Leave a comment