Captioning Quality Standards
We generally recommend using a professional service to caption your videos. However, if you are captioning your own videos, below are some important guidelines to keep in mind to ensure an inclusive, accessible experience for everyone:
Screaming and shouting should be shown using all caps. Capitalize proper nouns.
Italicized text should be used wherever quotation marks are being used. They should also be used for emphasis, and for words and phrases that aren’t in the primary language being used. They should also be used where a person is thinking or daydreaming, for any offscreen speech or sounds, (except for presentations in which an offscreen narrator is the only speaker), and where a word is being defined for the first time.
Intonation in Speech
Specific types of intonation should be denoted using words in brackets before the captions. This includes descriptions of emotion when the speaker is offscreen, whispering, or even “[silence]” or “[no audio]” when people can be seen talking, but there’s no sound.
Indicate regional accents at the beginning of the first caption such descriptions apply to. Words in another language should be captioned as spoken when possible. Otherwise, a “[speaking French]” or similar caption will suffice. Never translate words in another language to english. If possible, use accent markings where they’re needed.
Line Length/Number of Lines on Screen
There should only be as many as 2 lines on the screen at a time. There shouldn’t be more than 42 characters on a line (including spaces). Both lines should be about the same length. One line should never be less than half the length of the other.
Line Break Points
Lines should be broken at logical points in speech. Most of these breaks should occur at punctuation. Don’t end and begin a sentence on the same line unless the sentences are extremely short. When lines aren’t broken at punctuation, it is important to avoid breaking between a word and its modifier, in prepositional phrases, between first and last names or associated titles, or immediately after a conjunction.
Don’t bother captioning music that lasts less than 5 seconds. Nonessential background music should be indicated using a music symbol (♪). Where applicable, the style or presentation of the music should be described objectively. Whenever it’s possible to identify a musical work, the title and composer should be included. These are included in brackets at the beginning of the music.
Lyrics should be captioned word for word. They should be surrounded by music icons. The last line should be followed by two such icons in a row.
Numbers from one to ten should be spelled out, but anything above that should be written as numerals. Also spell out any numbers that begin a sentence, and other numbers that are related to them. Numerals should be used in conjunction with percentage signs (where applicable), unless the percentage falls at the beginning of a sentence. In this case, words should be used.
If a number has 5 or more digits, appropriate commas should be included. Numbers with 4 numbers may or may not include a comma, as long as this is consistent.
The appropriate “th,” “st,” or “nd,” should be used, unless the number in question is spoken as a date that includes the month and/or year.
Numerals should always be used to caption times of day. The zeros don’t necessarily need to be included if only the hour is spoken. Don’t use apostrophes when captioning decades.
Only relevant sound effects and other background noise should be captioned. Unimportant background noises, like in a crowded place, shouldn’t be captioned- only the main speaker’s words should be captioned in that case. In the case of sound effects, the effects should be captioned, but they don’t need to be unnecessarily described- “explosion” or “boom” would be a better description of fireworks than “three fireworks explode into the sky loudly”. It’s unnecessary because viewers can see the fireworks- that doesn’t need explanation.
There are many ways to identify speakers- one is by name, which is a great way to do it, but if viewers aren’t supposed to know a character’s name yet, that shouldn’t be used. In certain cases, more generic labels like “offscreen female” or “male (angrily):” may be used. Another good way to differentiate between characters is to have the captions appear as they begin speaking, so viewers can see who is speaking. The important thing is to be sure it’s apparent who is speaking in any given segment of video. If visual cues make this obvious, no label is needed. On the other hand, captioning for a person speaking offscreen should be preceded by a label. Start a new line or use labels when a new speaker begins talking.
Time On Screen (Caption Duration)
The speed of the speaker audio can impact how long a caption is on the screen at a time. In general, we recommend showing the caption at the same time the audio is being spoken, but you can “borrow” a few frames before or after to ensure the captions are on the screen long enough to be read (generally no more than 1.5 seconds before or after). It’s important not to preempt effects or speech- you don’t want to spoil something that depends on timing, like a joke’s punchline or a surprising explosion. These are just a few examples, but you get the idea- subtitles should match up with speaking as closely as possible, accounting for both slower and faster speech with occasional pauses or preemptive captioning. A general recommendation is that captions should stay between 160-180 words per minute.
Speech Fillers, Hesitations, Stutters
An excess of speech fillers, such as “um” or “like,” should be edited out so they don’t take away from the meaning of the other text/ speech. However, if these are used sparingly, they should be left in- a few can add flavor, but too many will overwhelm the text. Hesitations should be replicated to convey the same meaning- long pauses should be shown through ellipses, and smaller pauses should be shown through commas, dashes, or two dots. Stutters should be replicated too- they can be important to show characteristics of characters, like when someone is nervous, but if sentences need to be edited or shortened due to time, these are some of the first things that should be edited out since they are less important.
Verbatim or Clean Speech?
Use verbatim wording as much as possible. It keeps the information more consistent, but in situations where a scene is too busy for verbatim to make sense, it’s okay to edit/ simplify the sentences as needed. However, if there is time for verbatim speech, don’t take that away from viewers! And there is no need to simplify words used or edit out ‘filler’ words unless absolutely necessary- even words and phrases like “you know,” “well…” or “um” can add flavor to the text!
When writing captions, of course grammatical errors should be avoided to the best of your ability, but sometimes in the speech of a video, a character or person says something that is technically grammatically incorrect. This should be replicated in the captioning- it is better to have an ‘error’ in the captioning and convey the right information than it is to correct it and give viewers a less accurate experience. A good example of this would be certain accents- if a character says “y’all” that should be replicated, because changing it to “you all,” while grammatically more accurate, doesn’t convey the same feeling as what the character actually said.
If you would like to learn more about any of the above caption standards or other specific situations, we recommend the below resources for additional reading:
- DCMP Guidelines and Best Practices for Captioning Educational Video
- KA Captions Standards Guide
- The Ultimate Guide to Closed Captioning
- BBC Subtitle Guidelines
- Guide to Transcripts and Captions
- Closed Captioning Style Guide