AmericasNLP 2026 Annotation Guideline

3 minute read

Published:

Example

El Fandi performing a verónica

Reference caption (guideline): Un torero realiza un lance a la verónica con el capote ante el toro en la plaza.


5 — Excellent

“Un torero ejecuta una verónica con el capote ante el toro en la plaza.”

Fluent Spanish, accurate description, and uses the correct cultural term (verónica) for the specific cape pass shown. Nothing to fix.


4 — Good

“Un hombre con traje rojo sostiene una tela frente a un toro grande.”

Clear, correct Spanish and the description matches the image, but it avoids the cultural and technical vocabulary — “hombre” instead of torero, “traje rojo” instead of traje de luces, “tela” instead of capote, and no mention of the specific pass (verónica). Accurate but not culturally precise.


3 — Mixed

“Un hombre con ropa de colores sostiene una tela en un lugar grande.”

The Spanish is good, but the description is far too vague — it avoids all cultural terms, does not mention the bull at all, and gives almost no useful information about what is happening. Technically not wrong, but not clearly useful either.


2 — Poor

“Mujer azul con guitarra en parque luchar pequeño en calle mucho.”

The language is broken — missing articles, no conjugated verbs, no real sentence structure — and the content is largely wrong (no woman, no guitar, no park). Recognizable as an attempt, but unusable as a caption.


1 — Unusable

“Torador bulleando capelar el rojado arenoso con grandote luchamiento plazudo.”

The words look Spanish but most are invented or malformed, and the sentence cannot be understood. Not really written in the language.


Rating Guide

This page shows how the 1–5 rating scale works in practice, using a Spanish caption example.


How to rate

For each system caption, give one overall score from 1 to 5, considering two dimensions:

  1. Language quality — Is it written in the target language? Is it grammatical, fluent, and natural? If the language is broken, the caption cannot score high even if the content is right.
  2. Image fidelity & cultural appropriateness — Does the caption describe what you actually see in the image? Does it use the right cultural terms? Would someone from the community find it respectful and accurate?

Rating scale

  • 5 — Excellent: Fluent, natural language and an accurate, culturally grounded description of the image. Uses the correct cultural and technical terms. Nothing meaningful to fix.
  • 4 — Good: Clear, well-written language with a correct description, but has small flaws — e.g. minor language errors, missing caption details, or imprecise cultural vocabulary.
  • 3 — Mixed: The language is mostly understandable, but the caption gets things wrong about the image, omits important content, or is too vague to be clearly useful.
  • 2 — Poor: The language has serious problems (broken grammar, frequent errors, hard to follow) and the description is largely inaccurate. Still recognizable as an attempt at a caption.
  • 1 — Unusable: Not in the target language, not understandable as language, or completely unrelated to the image.