Tools for Automating The Captioning of Video

Handouts Media

Scheduled at 2:15pm in WB II on Wednesday, November 15 (2017).



  • Joseph Polizzotto, Alternate Media Supervisor, UC Berkeley
  • Joshua Hori, Accessible Technology Analyst, UC Davis

Session Details

  • Length of Session: 2-hr
  • Format: Lecture
  • Expertise Level: Intermediate
  • Type of session: General Conference


Creating synchronized video captions can be a time-consuming and tedious process. In this session, we will demonstrate how to use a range of free to low-cost tools that can speed up this process. We discuss the use of speech recognition tools to produce a "raw" transcript and the use of a forced aligner to synchronize a transcript with a video.


Many transcribers and video editors may experience a pain point in their production workflow when it comes to creation of a closed caption file (e.g., SRT, VTT, SBV). While many paid tools and services exist that can help with automatic synchronization of audio and text, why not consider using a fast and accurate tool that is open-source? And why not try to use speech recognition to quickly generate a transcript for an audio file while you're at it?

In this session, we will demonstrate how to use a range of speech recognition tools (IBM Watson, Google Speech, and Dragon) that can be used for generating a "raw" transcript of a video, and we will compare the accuracy of each tool. Second, we will demonstrate how to use Aeneas, a free tool, which can quickly and accurately synchronize a transcript with a video. We will discuss best practices and provide handouts.


  1. Compare accuracy of a variety of speech recognition tools for generating a video transcript
  2. Share best practices for segmenting a transcript in a captioning workflow
  3. Demonstrate use of Aeneas for synchronizing audio files with text files

Disability Areas

Deaf/Hard of Hearing

Topic Areas

Accessible Course Design, Alternate Format, Uncategorized, Web/Media Access

Speaker Bio(s)

Joseph Polizzotto

Joseph is the Alternate Media Supervisor at UC Berkeley. He previously was Assistive Technology Specialist Instructor at the High Tech Center Training Unit (HTCTU) of the California Community Colleges, where he trained college faculty and staff on alternate media workflows and assistive technology.

Joseph received a B.A. degree in History from the University of California, Santa Cruz and an M.A. degree in Teaching English to Speakers of Other Languages (TESOL) from San José State University. He has over 15 years of teaching experience in ESL and basic skills. His research interests include accessible EPUB 3 and mobile reading systems.

Joshua Hori

Joshua is the Accessible Technology Analyst for the Student Disability Center at the University of California, Davis campus. The past 10 years he has been involved with alternate media, web accessibility, and accessible technology implementations across the campus for students with disabilities, but believes they should be used by all students. He has also served as tech consultant for the UC Davis MIND institute in research on mobile apps and organization for students with learning disabilities and students on the spectrum. He co-authored a Chapter concerning assistive technologies in the book: The Guide to Assisting Students With Disabilities: Equal Access in Health Science and Professional Education.