As an habitual closed caption user for at least a decade (we rarely watch anything that isn’t captioned), I regularly read ahead of the audio dialogue. I’m able to read the captioned words faster than the actors can say them. This is something I think most caption users do, whether they can also hear the dialogue or not. Caption users can, under the right conditions, stay a beat ahead of everyone else, laughing at a joke, for example, before the punchline is spoken, or nodding in agreement before the speaker has finished making a point.

Captioning technology and conventions allow us to stay a step ahead. A two-row caption expressing a call-and-response format (one person says something and then another person responds) can be processed very quickly when both lines of dialogue are displayed on the screen at once, in “pop-on” style. Reading both lines before the actors say them is usually a piece of cake. Longer stretches of discourse pose no great challenge for seasoned caption users either, who can (again, under the right conditions) read faster than the speaker can talk. Being able to read quickly one-, two-, or three-row captions allows users to focus more of their attention on the movie or show itself.

In less than ideal conditions, poorly placed, ill-timed, and otherwise error-laden captions can create uptake problems for readers. In addition, live captions, in which captions are displayed a couple seconds after the accompanying words are spoken, simply do not allow users to engage in the kind of sophisticated, “one beat ahead” viewing style that occurs under the right conditions. In fact, live captions can make for a more demanding user experience, especially when they lag behind fast-paced, fast-talking shows.

But when captions are prerecorded to appear just before, at, or close to the moment when the accompanying audio begins, users can sometimes exploit the conventions of multi-row pop-on style captions to read ahead, even if reading ahead provides only the tiniest glimpse into the future. The flip side to knowing the future before everyone else is that the movie may not want you to know. The caption viewer may be out of sync, just slightly, with the action, or worse, stripped of the full experience of surprise and suspense.

Here’s a simple but powerful example from the 2008 movie Taken, starring Liam Neeson as a father trying to rescue his teenage daughter (Maggie Grace) from foreign kidnappers. (Spoiler alert) In this clip, the caption user recognizes a heartbeat before the non-caption user (or so I would argue) that because the bad guy’s captioned sentence is unfinished (“We can nego-”), he will be shot before he can finish saying “negotiate.” Of course, the caption user, like all viewers, also relies on context to make predictions about where the film is going: Neeson has already systematically killed everyone else on the boat, so it’s no big gamble to predict that this final bad guy — the one, at last, who is holding his daughter — will meet a similar fate, rather than the daughter being stabbed. That’s how these kinds of movies go, and we knew that walking in. But the caption tells us precisely when he will die (before he finishes that word in the caption) and how (by gunshot, since Neeson happens to be pointing a gun at the bad guy when Neeson walks into the room). (Graphic violence alert)

This text will be replaced

Perhaps it’s not an advantage to us, after all, when captions reveal secrets before the movie is ready to share them. But my larger point — encompassing any discussion of specific advantages or disadvantages — is that no one is really talking about the rhetoric of captioning, the ways in which captions (and text/image interplay more generally) create experiences for users that are different from uncaptioned experiences. Captions are not simply the text equivalent of spoken dialogue but create different opportunities for users, mediate meaning making differently, and, as I have begun to explore, add subtle and complex layers of meaning to video texts. A closer look at the rhetoric and style of closed captioning will prepare us to offer more pointed critiques of the limits of current thinking about captions — e.g. see Joe Clark’s excellent critique of “invariant-bottom-centred” captions — and, hopefully, improve caption technology and stylistic conventions in anticipation of that time very soon (should captioning of TV-like content become legally mandated on the Internet) when closed captioned video will be flooding the Web.

We don’t tend to talk about closed captions as providing, in some cases, a different (even advantageous) viewing experience over traditional, non-captioned ways of watching movies and TV shows. And yet I think that’s precisely what we need to talk about in order to bring closed captions closer to the mainstream. That’s what Web accessibility advocates do every time they discuss Web accessibility as benefiting everyone, not just users with disabilities. (Two quick examples: consider how Mobile Web Best Practices overlap with Web Accessibility Content Guidelines [e.g. see WAI], and how the practice of optimizing websites for search engines overlaps with the practice of making websites accessible [e.g. see McGee].)

