Semantically labeling input followed by discriminating different instances using clustering approaches is an effective strategy for tasks that require understanding both the semantic content (e.g., recognizing what objects are present) and the instance-level differentiation (e.g., identifying each unique occurrence of those objects). This is often seen in instance segmentation, object tracking, and scene understanding in computer vision.

1. Semantic Labeling:

Semantic labeling (or segmentation) assigns a class label to each pixel or region in an input, meaning it identifies and classifies parts of the image into predefined categories (e.g., distinguishing background from objects like "car," "tree," etc.). However, semantic segmentation alone does not differentiate between separate instances of the same class.

Techniques for Semantic Labeling:

2. Discriminating Different Instances:

To go beyond semantic labeling and differentiate between separate instances of the same object class, instance segmentation or object detection models are used. One effective method is to apply clustering approaches to distinguish these instances.

Approaches for Instance Differentiation:

  1. Embedding-Based Clustering:
  2. Distance Metrics:
  3. Watershed Algorithm:
  4. CenterNet and Similar Models:

3. Combining Semantic Segmentation with Clustering:

A typical workflow could involve:

Advantages: