Sub-Pixel Hot-Spot Detection in IR Imaging
The standard objection — “you need at least 2-3 pixels to detect a feature (Nyquist, characterization, robustness against hot/dead pixels)” — is sound for visible-light imaging. In IR thermography, the physics is different enough that a sub-pixel hot spot can dominate a pixel’s signal by orders of magnitude over the sensor noise floor.
This note works through why, with a worked example and the physical limits where the claim breaks.
The mixing model
A pixel does not measure “a temperature.” It integrates in-band spectral radiance over its instantaneous field of view (IFOV). If a fraction
of that IFOV is at temperature \(T_{\text{hot}}\) and the remainder \((1-f)\) is at \(T_{\text{bg}}\), the radiance reaching the detector for that pixel is the area-weighted average:
For an emissivity-corrected blackbody, the in-band radiance \(B(T)\) rises very steeply with temperature:
LWIR (8-14 µm): \(B(T) \propto T^{4}\) to a good approximation (Stefan-Boltzmann).
MWIR (3-5 µm): \(B(T) \propto T^{8}\!\dots T^{12}\), since the band sits up the Wien side of the Planck curve. Hot-spot contrast is even sharper.
A 100 × 100 µm hot spot occupies just 1 % of a 1 × 1 mm pixel IFOV. The pixel’s signal is the area-weighted average of the hot region and the cool background.
A worked numerical example
Quantity |
Value |
|---|---|
Pixel IFOV \(A_{\text{pixel}}\) |
1 mm × 1 mm = 1 mm² |
Hot spot \(A_{\text{spot}}\) |
100 µm × 100 µm = 0.01 mm² |
Area fraction \(f\) |
1 % |
Background \(T_{\text{bg}}\) |
25 °C = 298 K |
Hot spot \(T_{\text{hot}}\) |
200 °C = 473 K |
Band |
LWIR (\(B \propto T^{4}\)) |
Ratio of emissive powers:
Pixel-averaged radiance:
Inverting through \(T^{4}\) to recover an apparent pixel temperature:
Compare to typical sensor noise floors (NETD, noise-equivalent temperature difference):
Detector |
NETD |
|---|---|
Cooled MWIR / LWIR (HgCdTe, InSb) |
~ 20 mK |
Uncooled microbolometer LWIR |
~ 30-50 mK |
Note
\(\mathrm{SNR} \approx \dfrac{3.9~\text{K}}{0.030~\text{K}} \approx \mathbf{130}.\) A 1 % sub-pixel hot spot sits two orders of magnitude above the noise floor.
In MWIR, with its higher effective exponent, the same scenario pushes the apparent \(\Delta T\) to roughly 8-10 K — easier still.
Why visible-light imaging cannot pull this trick
Reflectance imaging mixes the same way,
but the gain is linear in the albedo difference \(\rho_{\text{spot}}-\rho_{\text{bg}}\), not in any power of it. A 1 % sub-pixel speck with 50 % reflectance contrast lifts the pixel by 0.5 % — right at the camera noise floor for an 8-bit sensor (~0.3-1 %).
LWIR radiance scales as \(T^{4}\). The 175 K rise from background to defect more than sextuples the emitted power, so even a 1 % area share contributes a measurable pixel-level signal.
The PSF assist
Even if the defect is physically sub-pixel, the optical point-spread function (Airy disk plus lens aberrations) spreads its photons over typically 2-4 pixels with FWHM on the same order as the pixel pitch. So the sensor response is not a single hot pixel but a small Gaussian-ish blob — easy to distinguish from a stuck-pixel artefact, which is a delta function and is also present in the dark-reference frame after non-uniformity correction (NUC).
Optical blur turns even a sub-pixel point source into a small Gaussian-shaped response across several pixels. Stuck or hot pixels, in contrast, are delta-shaped and stable across frames — they are removed by NUC.
What can break the claim
Two assumptions hide in the math; raise them when a vendor over-promises 1-pixel detection.
Emissivity uniformity. The mixing model assumed \(\varepsilon_{\text{spot}} \approx \varepsilon_{\text{bg}}\). A polished metal flake on oxidized steel can have \(\varepsilon \sim 0.1\) versus \(\varepsilon \sim 0.8\) — the contrast can flip sign or vanish entirely at some viewing angles. The radiance at a hot, low-\(\varepsilon\) surface mostly reflects the room behind the camera.
Atmospheric and window transmission. CO₂, H₂O, and ozone bands attenuate selectively across LWIR and especially MWIR. A long stand-off distance or a viewport (germanium, ZnSe) cuts effective \(\Delta T\). Always quote the band, path length, and window.
Bottom line: detection vs. characterization
Both views — “you need 2-3 pixels” and “1 pixel is enough” — are correct. They are answering different questions, and the disagreement evaporates once you separate the two.
“2-3 pixels” — Characterization view |
“1 pixel” — Detection view |
|---|---|
Question it answers: “What does the defect look like?” |
Question it answers: “Is there a defect here, yes or no?” |
Nyquist sampling. To pin down the spatial frequency of a feature without aliasing, you need at least 2 samples per cycle — so the feature must span \(\ge 2\) pixels. |
Optics spread the light. A real lens’s PSF blurs even a sub-pixel hot spot over 2-4 pixels (FWHM \(\ge\) 1-2 px). So the defect’s footprint can be 1 pixel even when the sensor’s response covers several. |
Morphology needs room. To measure size, shape, or orientation, you need enough pixels to fit a meaningful pattern — typically a \(3 \times 3\) neighbourhood as a minimum. |
IR contrast is exponential, not linear. Thermal radiance scales as \(T^{4}\) (LWIR) or steeper (MWIR), so a tiny hot region with a large \(\Delta T\) dominates the pixel’s signal far above NETD. |
Single-pixel anomalies are ambiguous. Without context, one bright pixel could be a real defect, a hot/stuck pixel, or shot noise. You can’t tell from just the pixel itself. |
Sensor artefacts are removed first. Non-uniformity correction (NUC) and reference-frame subtraction strip out hot/dead pixels, so a remaining 1-pixel anomaly is a real alarm, not a sensor quirk. |
Important
Verdict. If the speaker is making a detection claim — “we alarm on hot spots smaller than one pixel” — and it is backed by NUC plus PSF reasoning on a near-blackbody surface, 1 pixel is honest.
If they are claiming measurement at the 1-pixel scale — defect size, shape, orientation, classification — the “2-3 pixels theoretically” objection is right, and Nyquist and morphology are the reasons.
A point-source siren you can hear is not the same as a face you can recognise.