Salience Map

A visual representation of which areas of an image will draw the most attention based on bottom-up visual processing.

A salience map is a visual representation of which regions of an image or video frame will attract the most attention, based on the visual properties of the scene. Unlike attention heatmaps (which can incorporate semantic understanding), salience maps derive from bottom-up, pre-attentive visual processing — the automatic, involuntary attention captured by contrast, color, edges, and motion before conscious processing begins.

Salience emerges from visual features that stand out from their surroundings:

  • Luminance contrast: A bright object against a dark background
  • Color contrast: A colored object against a desaturated background
  • Edge density: Areas with many edges and sharp contours
  • Motion: Moving elements in an otherwise static scene
  • Face presence: Human faces are a strong salient feature regardless of other visual properties

Salience maps are generated by computational models (like Itti & Koch's classic model, or deep learning-based models) and are widely used in advertising research to ensure that key product or message elements are placed in high-salience zones.

For video creators, salience maps reveal what viewers' brains will involuntarily notice in each frame — before conscious attention can be directed. Text overlays, product placements, and speaker positioning should align with high-salience regions. Cluttered backgrounds, competing motion elements, and low-contrast subjects are common salience problems VidCognition's analysis can detect.