What Are Captions?

Captions are the text representation of words and other important audio information that are synced with a video.

The National Association of the Deaf defines captioning as “the process of converting the audio content of a television broadcast, webcast, film, video, CD-ROM, DVD, live event, or other productions into text and displaying the text on a screen, monitor, or other visual display system.

“Captions not only display words as the textual equivalent of spoken dialogue or narration, but they also include speaker identification, sound effects, and music description.”

There are some important things to understand about different types of captions. Just because you see text on a video, doesn’t mean it has a fully accessible captions file. Captions should be a separate text file associated with the video that can be adjusted to viewer preferences.

Types of Captions

Closed captions can be turned on and off according to the viewer's preferences. Open captions are burned into a video. While open captions are better than nothing, closed captions provide users with the most accessible experience.

Open vs. Closed Captions

Open Captions

There are some challenges that come with open captions:

They are burned in to the video and cannot be edited.
The captions cannot be turned off.
They cannot be moved.
A viewer cannot change the font size or text color of the captions.
The captions are not searchable.

Open captions should only be used if a closed captioning option is not available.

Closed Captions

Closed captions are added to a video as a separate text file and provide several benefits for those who access your content:

The font size and color of the text can be adjusted.
The captions can be turned on or off by the viewer.
The caption file can be edited.
The captions can be searched.
The captions can be moved on screen.

Machine (Auto-Generated) Captions vs. Professional Captions

There are also different levels of quality with video captions. Professional captions are reviewed by a human editor who makes sure they meet federal quality standards and have a high accuracy rate. Machine (auto-generated) captions use automatic speech recognition (ASR) software to create closed captions on a video. This is frequently a built-in feature on platforms such as YouTube. While they can be a helpful starting point, machine captions do not meet the USU Video and Audio Accessibility Standards. 3Play Media has found that “solely using ASR to generate auto-captions for recorded videos is detrimental to the accuracy of the captions”.

You have probably seen examples of machine caption errors when watching a video, but a few examples can be seen below:

Two men singing "Rudolph the Red Nose Reindeer" with subtitles on screen saying "read off the rent those reindeer".

A baseball game with a pitcher about to pitch to a batter with the caption "drafted out of jail 2008. He'll be followed by Crawford."

Old cartoon of Mickey Mouse with the caption "parent and their families half and and anyhow going and and and and and and and".

Spongebob with a shocked faces and the caption "what we're going to do business with ants".

Sometimes machine captions can produce a fairly accurate caption, but there are many factors that have to be just right for that to happen, including high quality audio, no background noise, little or no grammatical errors, and few mispronunciations. If any of these are not just right, the accuracy of the captions can drop as low as 50 percent and lead to a horrible user experience. 3Play Media states, “The key to 99% or higher caption accuracy is human interaction” such as that available from professional captioning.

Auto-generated captions can also hurt your brand. First, capitalization and punctuation errors can make it difficult for a viewer to understand what your content is actually about. Auto-captions are well known for missing words or adding the wrong word in place of another, and you don’t ever want to be seen as saying something you don’t mean. This could cause embarrassment for both you and your brand.

In addition, auto-captions can make viewers who need captions feel like they are not valued by the brand. Without captions, some viewers have no choice but to stop watching the video. Providing high-quality captions helps all viewers feel included and valued.

Commonly Mistaken as Captions

Besides captions, there are some other text representations of audio content that are sometimes confused as captions. These can be helpful in different use cases.

Subtitles

Subtitles include only the spoken words of video or audio that are geared toward viewers who do not understand or are not familiar with the language of the video. They do not contain non-speech descriptions that are necessary for accessibility such as music or sound effects.

Transcripts

Transcripts are a text file of the spoken words in video or audio. They may or may not have non-speech descriptions. Transcripts are particularly useful for podcasts or other content that only has audio.