What Are Captions?
Captions are the text representation of words and other important audio information that are synced with a video.
The National Association of the Deaf defines captioning as “the process of converting the audio content of a television broadcast, webcast, film, video, CD-ROM, DVD, live event, or other productions into text and displaying the text on a screen, monitor, or other visual display system. Captions not only display words as the textual equivalent of spoken dialogue or narration, but they also include speaker identification, sound effects, and music description.”
There are some important things to understand about different types of captions. Just because you see text on a video, it may or may not be a fully accessible captions file. Captions should be a separate text file associated with the video that can be adjusted to viewer preferences.
Types of Captions
Closed captions can be turned on and off according to the viewer's preferences while open captions are burned into a video. While open captions are better than nothing, closed captions provide the most accessible experience to users.
Open vs. Closed Captions
There are some challenges that come with open captions:They are burned in to the video and cannot be edited
The captions cannot be turned off
They cannot be moved
A viewer cannot change the font size or text color of the captions
The captions are not searchable
Open captions should only be used if a closed captioning option is not available.
Closed captions are added to a video as a separate text file and provide several benefits for those who access your content:The font size and color of the text can be adjusted
The captions can be turned on or off by the viewer
The caption file can be edited
The captions can be searched
The captions can be moved on screen
Machine (Auto-generated) Captions vs. Professional Captions
There are also different levels of quality with video captions. Professional captions are reviewed by a human editor who makes sure they meet federal quality standards and have a high accuracy rate. Machine (auto-generated captions) on the other hand, use automatic speech recognition (ASR) software to create closed captions on a video. This is frequently a built-in feature on platforms such as YouTube. . While they can be a helpful starting point, machine captions do not meet the USU Video and Audio Accessibility Standard. 3Play Media has found that “solely using ASR to generate auto-captions for recorded videos is detrimental to the accuracy of the captions”.
You have probably seen examples of machine caption errors when watching a video, a couple of examples can be seen below:
While a machine caption can sometimes produce a fairly accurate caption, there are many factors that have to be just right for that to happen, including high quality audio, no background noise, little to no grammatical errors, and few mispronunciations. If any of these factors are not just right, the accuracy of the captions can drop as low as 50 percent and lead to a horrible user experience. 3Play Media states, “The key to 99% or higher caption accuracy is human interaction” such as that available from professional captioning.
Auto-generated captions can also hurt your brand. First, capitalization and punctuation errors can make it difficult for a viewer to understand what your content is actually about. Auto-captions are well known for missing words or adding the wrong word in place of another, and you don’t ever want to be seen as saying something you don’t mean. This could cause embarrassment for both you and your brand.
In addition, auto-captions can make viewers who need captions feel like they are not valued by the brand. Without captions, some viewers have no choice but to stop watching the video. Providing high-quality captions helps all viewers feel included and valued. Not only does it bring respect to you and your brand, but it also protects your brand from legal repercussions associated with federal guidelines that require equity, equality, and inclusion.
Commonly Mistaken As Captions
Besides captions, there are some other text representations of audio content that are sometimes confused as captions. These can be helpful in different use cases.
Subtitles include only the spoken words geared toward viewers who do not understand or are not familiar with the language of the video. They do not contain non-speech descriptions that are necessary for accessibility such as music or sound effects.
Transcripts are a text file of the spoken words in a video. They may or may not have non-speech descriptions. Transcripts are particularly useful for podcasts or other content that only has audio.