Feature Detection
Feature detection in computer vision, progressing from points and patches to edges and lines.
Introduction
Features are distinctive patterns or structures in images that can be used for tasks like object recognition, scene understanding, and tracking.
Features are distinctive patterns or structures in images used for:
- Object recognition
- Scene understanding
- Tracking
Points and Patches
Use Case: Matching across images (e.g., panorama stitching, change detection).
Good features to track:
- Unique and distinctive in their local neighborhood.
- Found via auto-correlation function over small displacements.
Mathematical basis:
-
Auto-correlation matrix (A): $A = \sum W \begin{bmatrix} I_x^2 & I_x I_y \ I_x I_y & I_y^2 \end{bmatrix}$
where $I_x$ and $I_y$ are image gradients in the $x$ and $y$ directions, respectively.
-
Eigenvalue interpretation:
- Both large → corner (good feature).
- One large, one small → edge.
- Both small → flat region.
Detectors:
-
Harris Corner Detector: $R = \det(A) - k \cdot (\text{trace}(A))^2$, where $k \in [0.04, 0.06]$
-
Shi–Tomasi Detector: $\min(\lambda_1, \lambda_2) > \text{threshold}$
Harris Corner Detector
Formula: $R = \det(A) - k \cdot (\text{trace}(A))^2$ where $A$ is the structure tensor.
Idea: Measures how much intensity changes in all directions.
Corner Criterion: Large positive $R$ indicates a corner.
Pros:
- Good at detecting corners even under rotation.
Cons:
- Sensitive to the parameter $k$.
- May detect edges as corners.
Shi–Tomasi Detector
Formula: Uses eigenvalues $\lambda_1, \lambda_2$ of matrix $A$.
Corner Criterion: $\min(\lambda_1, \lambda_2) > \text{threshold}$
Idea: A point is a good corner if both eigenvalues are large.
Pros:
- More robust and selective than Harris.
- Avoids false positives.
Cons:
- Slightly more computationally expensive due to eigenvalue calculation.
Edge Detection
Definition: Edges represent locations of rapid intensity change in one direction.
Useful for:
- Segmenting objects
- Finding boundaries
- Identifying shapes
Key Methods
- Harris Detector: Extends from corner detection to detect points where one gradient direction is strong and the other is weak.
- Gradient-Based Methods (Sobel, Prewitt, Difference of Gaussian, Canny):
- Compute gradients in X and Y directions.
- Calculate gradient magnitude:
$\sqrt{G_x^2 + G_y^2}$ - Apply thresholding to identify edges.
- Preprocessing: Noise removal and smoothing.
- Postprocessing: Non-maximum suppression to thin edges and remove weak/unconnected edges.
Popular Edge Detectors
- Sobel: Uses fixed convolution kernels for X and Y directions to emphasize edges.
- Prewitt: Similar to Sobel but with slightly different kernel values.
- Canny Edge Detector:
- Gaussian smoothing.
- Gradient computation.
- Non-maximum suppression.
- Double thresholding.
- Edge tracking by hysteresis.
Canny often produces more accurate and meaningful edges than Sobel or Prewitt.
Line Detection
Lines are special cases of edges, representing straight boundaries.
Applications
- Shape detection (buildings, rectangles, checkerboards for calibration).
- Prior knowledge of line presence aids in recognition.
Hough Transform
A classical method to detect lines:
- Perform edge detection to identify candidate points.
- Represent lines in polar coordinates
(ρ, θ):ρ = x*cosθ + y*sinθ
- Use an accumulator array to count how many edge points align for each
(ρ, θ)pair. - Peaks in the accumulator correspond to strong candidate lines.
- Work backwards to draw detected lines.
The Hough Transform is efficient for finding straight lines in binary edge maps.
Region / Blob Detection
Definition:
Blobs are regions with homogeneous properties (e.g., color, brightness, texture).
Uses:
- Identifying object parts.
- Regions of Interest (ROI) for recognition and tracking.
- Shape and size analysis.
Classical Methods
Laplacian of Gaussian (LoG)
- Detects blobs by finding zero-crossings of the LoG filter response.
-
Optimal scale:
$\sigma \approx 0.707 \times \text{(blob size)}$This scale maximizes the filter’s response.
- Difference of Gaussians (DoG):
- A computationally efficient approximation of LoG.
- Obtained by subtracting two Gaussian-blurred images with different $\sigma$ values.
Maximally Stable Extremal Regions (MSER)
- Vary the intensity threshold from dark to bright.
- Select regions that remain stable over a wide range of thresholds.
- Properties:
- Invariant to affine transformations.
- Robust to monotonic intensity changes.
- Applications:
- Text detection.
- Road sign and number plate recognition.
Post-processing
To improve detection quality:
- Reject regions that are too large or too small.
- Discard unstable regions.
- Remove redundant overlapping regions.
Leave a comment