Caption Quality Standards
We recommend using our professional captioning service to caption your videos. However, if you're captioning your own videos, follow the guidelines below to ensure an inclusive, accessible experience for everyone.
Should Captions Be Verbatim?
In general, try to use the same wording as the speaker, but minor edits to what was said can sometimes make the captions more usable. If you do alter the text, be careful that you do not change the meaning of what was said.Guidelines to consider:
- You may remove filler words, such as “you know,” “well…” or “um” and other non-essential information. However, sometimes a stutter or filler word can be meaningful (if it’s used to show nervousness, for example).
- When a speaker uses grammatically incorrect language or a dialect, it should be reflected in the captions.
- Words in another language should be captioned as spoken. Otherwise, a “[speaking French]” or similar caption will suffice. Never translate words in another language to English.
- If speech is cut off or low quality and is impossible to understand, use the word inaudible with brackets: [inaudible].
- Hesitations should be replicated in the captions - long pauses should be shown using ellipses (...), and shorter pauses should be shown using commas, dashes, or two dots.
Line Length and Breaks
Adding line breaks in the correct places dramatically increases the quality of the experience for the viewer.
Guidelines to consider:
- There should be a maximum of 2 lines on the screen at a time.
- Don’t include more than 42 characters on a line (including spaces).
- Where possible, both lines should be about the same length.
- Lines should be broken at natural pauses in speech.
- Punctuation is often a natural breakpoint.
- Avoid breaking between a word and its modifier, within prepositional phrases, between first and last names or associated titles, or immediately after a conjunction. See the Line Division section of the Described and Captioned Media Program’s (DCMP) Captioning Key for some helpful examples.
- Don’t end and begin a sentence on the same line unless the sentences are extremely short (Three words or less as a general rule).
Time on Screen
The speed of the speech can impact how long a caption is on the screen at a time. When possible, show the caption at the same time the audio is being spoken. However, you can “borrow” a few frames before or after (generally no more than 1.5 seconds) to ensure the captions are on the screen long enough to be read.
Screaming and shouting should be shown using ALL CAPS. Capitalize proper nouns such as names, places, organizations.
Italicized text should be used:
- When a speaker is quoting someone else.
- For words and phrases that aren’t in the primary language of the video.
- When a person is thinking or daydreaming.
- For offscreen speech or sounds
- Spell out numbers from one to ten, use numerals for numbers over ten.
It is important for viewers to understand who is speaking in the captions:
- Start a new line or use labels when a new speaker begins talking.
- If a speaker is visible on screen, introduces themselves, or their name is shown on screen, no identification is needed.
- If it is unclear who is speaking, consider identifying the speaker on a line before their speech:
MR. SMITH: Let's eat lunch.OFFSCREEN NARRATOR: This is the next video in our series.
- Once a speaker has been identified, we recommend using two arrows to indicate when a speaker changes:
>> Hi Doctor, nice to meet you.
>> General, pleasure.
- If the speaker has been identified but is not on-screen, be sure to still identify them.
Non-speech elements include music without lyrics, sounds, or the absence of sound. These elements can be an important part of conveying a video’s meaning.
- Generally, music or sound effects should be included in their own caption box so they appear separately on a screen, rather than within text.
- Non-speech elements should be presented within square brackets “[ ]”.
- If a non-speech element, such as music or no audio, continues through the entire video, simply add one caption box with the non-speech element at the beginning of the video followed by an ellipsis. This type of caption should remain on the screen for 15 seconds.
[Upbeat music playing...]
Captioning sound effects can be very important for helping someone who is deaf or hard of hearing understand a video. However, be careful to only include sound effects that are necessary or helpful in understanding or enjoying the video.
When deciding whether to include a sound effect, you may want to ask yourself if you think most people would want to read about that sound effect or if it adds any value to the video.
We recommend reviewing the Sound Effects section of the DCMP Captioning Key for specific examples.
Music is often used to communicate a mood or theme.
- When possible, the title and composer of a musical work should be included.
[Enya playing "Orninoco Flow"]
- For unknown music, the style or presentation of the music should be described objectively.
[Calm piano music]
- If the message of song lyrics is important, they should be captioned word for word. Surround lyrics with music icons (♪).
- Nonessential background music that doesn’t contribute to the mood can be ignored.
There is additional guidance from the Music section of the DCMP Captioning Key that may be helpful.
Other Non-Speech Elements
There may be other non-speech communication in a video that is important to include in captions. For example:
- If there is speech that contains an important emotional cue that is not shown, you can indicate it in the captions. For example:
My dog died.
- If it appears that people are talking, but they actually aren’t, it would be important to let the viewer know that no speech is happening:
If you would like to learn more about any of the above caption standards or other specific situations, we recommend the below resources for additional reading: