Skip to content

wailsb/wail-prec-sbgd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WAIL-PREC-SBGD: Pattern Recognition using Score-Based System & Gradient Descent

Current C++ Implementation (classificator module)

The project now includes a concrete demo pipeline in the classificator folder that does two core tasks:

  1. Build a neighbor graph from active points (spatial relationships).
  2. Build chains of neighbors (connected components) and classify each chain with a score-based model trained with gradient descent.

Data flow

  1. Point activation by saturation

    • isActivePoint(...) converts RGB to HSV-style saturation and marks a point active when saturation is above a threshold.
  2. Neighbor generation

    • neighboorArray(...) treats each point as a graph node.
    • Two points become neighbors when distance is within either point radius.
    • Neighbor links are stored symmetrically.
  3. Chain generation

    • generateNeighborChains(...) runs BFS on the neighbor graph.
    • Each connected component becomes one NeighborChain.
  4. Feature extraction per chain

    • avgSaturation: average saturation across points in the chain.
    • compactness: inverse-normalized average pairwise distance.
    • density: local graph connectivity.
    • sizeNorm: normalized chain size.
  5. Classification with score + gradient descent

    • classifyNeighborChains(...) uses a linear score with sigmoid output for confidence.
    • Weights are refined over epochs using gradient descent against a pseudo-target heuristic.
    • Output includes:
      • confidence for each chain,
      • isFaceLike boolean per chain,
      • ClassificationResult summary (totalChains, faceLikeChains, bestConfidence).
  6. Memory cleanup

    • freeNeighborChains(...) releases chain allocations.

Public API added

Declared in classificator/classificator.h:

  • int generateNeighborChains(MemoryZone *zone, NeighborChain **chainsOut);
  • ClassificationResult classifyNeighborChains(NeighborChain *chains, int chainCount, float learningRate, int epochs);
  • void freeNeighborChains(NeighborChain *chains, int chainCount);

Run the demo test

From project root:

g++ .\classificator\classificator.cpp .\classificator\test_classificator.cpp -std=c++17 -o .\classificator\test_classificator.exe
.\classificator\test_classificator.exe

Notes

  • This is still a demo / educational implementation (small scale, simple heuristics).
  • The classifier currently uses pseudo-label heuristics instead of an externally labeled dataset.
  • The architecture is now ready to swap pseudo-targets with real training labels later.

Mathematical Formalization

1. Input Representation

Given an input image $I \in \mathbb{R}^{N \times M \times 3}$ where:

  • $N, M$ are image dimensions
  • Each pixel $p_{i,j} = (R, G, B)$ with $R, G, B \in [0, 255]$

Convert to HSV color space: $I_{HSV} \in \mathbb{R}^{N \times M \times 3}$ where each pixel has $(H, S, V)$

2. Saturation Shift Detection

Define saturation gradient: $$\nabla S(i,j) = \sqrt{\left(\frac{\partial S}{\partial x}\right)^2 + \left(\frac{\partial S}{\partial y}\right)^2}$$

Where: $$\frac{\partial S}{\partial x}(i,j) = S(i+1,j) - S(i,j)$$ $$\frac{\partial S}{\partial y}(i,j) = S(i,j+1) - S(i,j)$$

Activation zones detected when: $$|\nabla S(i,j)| > \theta_{shift}$$

Where $\theta_{shift}$ is threshold (e.g., 50-100 for significant shifts)

Define activation zone set: $$Z = {(x,y) \mid |\nabla S(x,y)| > \theta_{shift}}$$

3. Shape Extraction and Normalization

Extract boundary points forming shape contour: $$B = {b_1, b_2, ..., b_k}$$ where $b_i = (x_i, y_i) \in Z$

Compute shape centroid: $$c = (\bar{x}, \bar{y}) = \left(\frac{1}{k}\sum_{i=1}^k x_i, \frac{1}{k}\sum_{i=1}^k y_i\right)$$

Compute orientation angle $\phi$ using principal component analysis: $$\phi = \arctan\left(\frac{\lambda_1 - a}{b}\right)$$

Where $a, b, c$ are covariance matrix elements and $\lambda_1$ is largest eigenvalue.

Apply rotation transformation $R(\phi)$ to normalize: $$B' = R(-\phi) \cdot (B - c)$$

4. Score System

For each cell $(i,j)$ in normalized region, compute score vector $\mathbf{s}_{i,j}$:

4.1 Identity Score: $$s_{identity}(i,j) = \exp\left(-\frac{|I'(i,j) - T(i,j)|^2}{2\sigma^2}\right)$$

Where $T$ is target template, $\sigma$ is sensitivity parameter.

4.2 Neighborhood Score: $$s_{neighbor}(i,j) = \frac{1}{8}\sum_{(p,q) \in N(i,j)} \exp\left(-\frac{|I'(p,q) - T(p,q)|^2}{2\sigma^2}\right)$$

Where $N(i,j)$ is 8-neighborhood of $(i,j)$.

4.3 Overall Score: $$s_{overall}(i,j) = w_1 \cdot s_{identity}(i,j) + w_2 \cdot s_{neighbor}(i,j)$$

With weights $w_1 + w_2 = 1$ (typically $w_1 = 0.6, w_2 = 0.4$).

5. Recognition Decision

Compute global match score: $$S_{match} = \frac{1}{|R|}\sum_{(i,j) \in R} s_{overall}(i,j)$$

Where $R$ is the region of interest.

Classification: $$\text{Match} = \begin{cases} \text{True} & \text{if } S_{match} \geq 0.7 \ \text{False} & \text{otherwise} \end{cases}$$

6. Gradient Descent Optimization

To improve matching, optimize weights $\mathbf{w} = (w_1, w_2)$ using gradient descent:

Loss function: $$L(\mathbf{w}) = \frac{1}{n}\sum_{i=1}^n \left(y_i - S_{match}^{(i)}(\mathbf{w})\right)^2$$

Where $y_i \in {0, 1}$ are ground truth labels.

Update rule: $$\mathbf{w}_{t+1} = \mathbf{w}_t - \eta \nabla L(\mathbf{w}_t)$$

Where $\eta$ is learning rate and: $$\frac{\partial L}{\partial w_k} = \frac{2}{n}\sum_{i=1}^n \left(y_i - S_{match}^{(i)}\right) \cdot \frac{\partial S_{match}^{(i)}}{\partial w_k}$$

Algorithm Summary

Input: Image $I$, Target template $T$
Output: Match score $S_{match}$ and classification

  1. Convert $I$ to HSV color space
  2. Detect saturation shift zones $Z$
  3. Extract shape boundary $B$ and normalize to $B'$
  4. Compute score matrix $\mathbf{S}$ for all pixels
  5. Calculate global match score $S_{match}$
  6. Apply gradient descent to update parameters
  7. Return classification result

Learning Resources: Understanding the Mathematics

1. Color Space Conversion (HSV)

What you need to know:

  • RGB represents colors as combinations of Red, Green, Blue (0-255 each)
  • HSV represents colors as Hue (color type 0-360°), Saturation (color intensity 0-1), Value (brightness 0-1)
  • HSV is better for detecting color changes because saturation and brightness are separate

Learn from:

Key concept: Saturation tells us how "colorful" vs "gray" a pixel is - perfect for finding edges between different regions.


2. Gradient & Partial Derivatives

What you need to know:

  • $\frac{\partial S}{\partial x}$ means "how much does saturation change when we move right by 1 pixel"
  • It's just: S(next_pixel) - S(current_pixel)
  • The gradient $\nabla S$ combines x and y changes to show the direction and magnitude of biggest change

Learn from:

Key concept: Think of gradient as "which direction would water flow" - it points toward the steepest change.

Simple example:

Saturation values:  [0.2, 0.2, 0.8, 0.9]
Gradient:           [0,   0,   0.6, 0.1]  <- Big jump at index 2!

3. Set Notation & Set Builder

What you need to know:

  • $Z = {(x,y) \mid |\nabla S(x,y)| &gt; \theta}$ reads as: "Z is the set of all points (x,y) such that gradient magnitude > threshold"
  • $\mid$ means "such that" or "where"
  • It's just a fancy way to write: for each (x,y): if gradient > threshold, add to Z

Learn from:

Key concept: This is just a mathematical "filter" - collect all points where condition is true.


4. Centroid & Mean

What you need to know:

  • Centroid = "center of mass" or average position of all points
  • $\bar{x} = \frac{1}{k}\sum_{i=1}^k x_i$ means: "add all x coordinates and divide by count"
  • It's literally just the average: (x1 + x2 + ... + xk) / k

Learn from:

Key concept: Used to find the "middle" of your detected shape so you can center it.


5. Principal Component Analysis (PCA) & Rotation

What you need to know:

  • PCA finds the main "direction" of your shape
  • The orientation angle tells you how tilted the shape is
  • You use this angle to rotate the shape to a standard orientation (like rotating a tilted face to be upright)

Learn from:

Simplified approach for coding: You can skip full PCA and use simpler method:

  1. Find two most distant points in your shape
  2. The line between them gives you orientation angle
  3. Use basic 2D rotation matrix to rotate points

2D Rotation Formula (easier to start with):

x_new = x*cos(θ) - y*sin(θ)
y_new = x*sin(θ) + y*cos(θ)

6. Gaussian (Bell Curve) for Similarity

What you need to know:

  • $\exp\left(-\frac{d^2}{2\sigma^2}\right)$ is the Gaussian/normal distribution
  • It outputs values close to 1 when distance $d$ is small (similar pixels)
  • It outputs values close to 0 when distance is large (different pixels)
  • $\sigma$ controls how "strict" the comparison is (smaller = more strict)

Learn from:

Key concept: This creates a smooth similarity score. Pixels that are almost identical get score ~0.99, slightly different get ~0.7, very different get ~0.01.

Simple example:

If d=0 (identical):     exp(-0/50) = 1.0    (perfect match!)
If d=5 (similar):       exp(-25/50) = 0.61  (pretty good)
If d=20 (different):    exp(-400/50) = 0.0003 (very different)

7. Euclidean Distance (Norm)

What you need to know:

  • $|A - B|$ means distance between points A and B
  • For pixels: $\sqrt{(R_1-R_2)^2 + (G_1-G_2)^2 + (B_1-B_2)^2}$
  • It's just the Pythagorean theorem in 3D (for RGB) or 2D (for positions)

Learn from:


8. Gradient Descent Optimization

What you need to know:

  • Used to automatically find best weights $w_1$ and $w_2$
  • Loss function $L$ measures "how wrong are we?"
  • We adjust weights in direction that reduces error (go downhill on error curve)
  • Learning rate $\eta$ controls step size (0.01 = take small careful steps)

Learn from:

Key concept: Like walking downhill in fog. You check which direction is downward (gradient), take a small step that way, repeat until you reach the bottom.

Simple analogy:

Imagine you're trying to tune a radio (weights = dial position)
Loss = how much static you hear
Gradient = "turning left makes it worse, turning right makes it better"
You keep turning right (adjusting weights) until static (loss) is minimized

9. Summation Notation

What you need to know:

  • $\sum_{i=1}^n a_i$ means "add up all values from $a_1$ to $a_n$"
  • It's just a for-loop in math notation: total = 0; for i in 1..n: total += a[i]

Learn from:


Implementation Strategy

Start Simple, Build Up:

  1. Week 1: RGB to HSV conversion

    • Test with single pixel first
    • Then test with a small 3x3 image
  2. Week 2: Gradient computation

    • Start with 1D case (just x-direction)
    • Extend to 2D
  3. Week 3: Finding activation zones

    • Create threshold and collect points
    • Visualize them (print coordinates)
  4. Week 4: Simple shape centering

    • Skip fancy PCA, just compute centroid
    • Translate points to center
  5. Week 5: Score system (without optimization)

    • Hard-code weights first (0.6, 0.4)
    • Test on single pixel pairs
  6. Week 6: Gradient descent

    • Start with 1D optimization problem
    • Extend to multi-parameter case

Recommended Tools for Learning

  • Desmos Calculator: https://www.desmos.com/calculator - Graph functions visually
  • Python Notebook: Use Jupyter to experiment with small examples
  • NumPy Tutorial: Learn array operations (similar to image operations)
  • Math Visualization: Use matplotlib to plot gradients, score functions

Quick Reference: Implementation Pseudocode

// 1. Saturation Shift Detection
function detectShifts(image):
    hsv_image = convertRGBtoHSV(image)
    for each pixel (i,j):
        grad_x = hsv[i+1][j].S - hsv[i][j].S
        grad_y = hsv[i][j+1].S - hsv[i][j].S
        magnitude = sqrt(grad_x² + grad_y²)
        if magnitude > threshold:
            add (i,j) to activation_zones
    return activation_zones

// 2. Shape Normalization
function normalizeShape(points):
    centroid = average of all points
    translate points by -centroid
    // Optional: compute orientation and rotate
    return normalized_points

// 3. Score Computation
function computeScore(pixel, template_pixel, sigma):
    distance = ||pixel - template_pixel||
    score = exp(-distance² / (2 * sigma²))
    return score

// 4. Recognition
function recognizePattern(image, template):
    scores = []
    for each pixel:
        identity_score = computeScore(pixel, template_pixel)
        neighbor_score = average of neighbor scores
        overall = 0.6 * identity_score + 0.4 * neighbor_score
        scores.append(overall)
    
    match_score = average(scores)
    return match_score >= 0.7

// 5. Gradient Descent (Advanced)
function optimizeWeights(training_data):
    w1 = 0.6, w2 = 0.4
    learning_rate = 0.01
    
    for iteration in 1..100:
        loss = 0
        gradient_w1 = 0, gradient_w2 = 0
        
        for each sample:
            predicted = computeMatchScore(sample, w1, w2)
            error = predicted - actual_label
            loss += error²
            
            gradient_w1 += error * (∂score/∂w1)
            gradient_w2 += error * (∂score/∂w2)
        
        w1 -= learning_rate * gradient_w1
        w2 -= learning_rate * gradient_w2
    
    return w1, w2

Questions to Ask Yourself While Coding

  1. What are the inputs and outputs of this function?
  2. What does this formula do in plain English?
  3. Can I test this on a tiny example (2x2 image)?
  4. What should happen at image boundaries?
  5. What if the input is all zeros? All the same value?
  6. How do I visualize the result to verify correctness?

Good luck! Take it step by step, and don't hesitate to implement simpler versions first before adding complexity.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors