The choice between bounding boxes and pixel-level segmentation masks is rarely a pure accuracy decision. It is an economics decision. Bounding box annotation costs roughly one-fifth as much per image as polygon segmentation, yet for many detection tasks — counting vehicles in a parking lot, flagging safety equipment — it delivers equivalent model performance. Before specifying your annotation type, define the exact inference task and work backwards to the minimum label fidelity that satisfies it.
Occlusion is the most under-specified variable in computer vision annotation briefs. Annotators who receive no guidance on partially visible objects will make wildly inconsistent choices — some labeling the visible portion, others skipping the object entirely, others drawing the full estimated extent. That variance becomes systematic bias. Every annotation brief should include an explicit occlusion policy with visual examples at each visibility threshold (25%, 50%, 75% visible).
For semantic and instance segmentation tasks, edge fidelity drives model performance more than overall mask coverage. A rough mask that captures 95% of the pixels but loses 30% of the boundary detail will produce a model that fails at exactly the place where users notice — object edges and intersections. Requiring annotators to use 2× zoom for boundary tracing and running automated edge-fidelity scoring against a gold standard set catches the majority of systematic edge errors before they enter training.