Salience Map

A salience map is a visual representation of which regions of an image or video frame will attract the most attention, based on the visual properties of the scene. Unlike attention heatmaps (which can incorporate semantic understanding), salience maps derive from bottom-up, pre-attentive visual processing — the automatic, involuntary attention captured by contrast, color, edges, and motion before conscious processing begins.

Salience emerges from visual features that stand out from their surroundings:

Luminance contrast: A bright object against a dark background
Color contrast: A colored object against a desaturated background
Edge density: Areas with many edges and sharp contours
Motion: Moving elements in an otherwise static scene
Face presence: Human faces are a strong salient feature regardless of other visual properties

Salience maps are generated by computational models (like Itti & Koch's classic model, or deep learning-based models) and are widely used in advertising research to ensure that key product or message elements are placed in high-salience zones.

For video creators, salience maps reveal what viewers' brains will involuntarily notice in each frame — before conscious attention can be directed. Text overlays, product placements, and speaker positioning should align with high-salience regions. Cluttered backgrounds, competing motion elements, and low-contrast subjects are common salience problems VidCognition's analysis can detect.

Related Terms