Kaltura, Captioning Tools

This article is an overview and walk through of captioning tools in Kaltura

Automatic Speech Recognition (ASR) captioning is available via Kaltura (in Media Space, Compass 2g, and Moodle). 
Accessible and searchable media is very important to the University of Illinois. We strive to make all media captioned to enable discovery and to ensure that accessibility needs are met. Captioned video is useful for those with a hearing loss and for users with certain learning disabilities. It can also be a useful tool to anyone who struggles to understand an accent, as reading text along with listening assists with comprehension.

 Before you start, keep in mind:

  1. There are two steps in this process, ordering captions and editing them.  After you order captions, they will be added automatically and will be available to anyone viewing your video, but they will not be entirely accurate and you will need to plan on correcting them.
  2. Good news: If you trim a video using Kaltura's new editing tool, it will trim the caption file to stay in sync with the video.  Bad news: after you trim the video, you will no longer be able to edit the caption file online.  So you should complete any trimming of your video before you order captions or after you have edited the caption file to your satisfaction.   
  3. It would be a good idea to practice this process the first time with a short private video

Media creators are responsible for captioning their content.  Content that is not captioned may be removed from public view. If you have a student or colleague that will benefit from this captioning feature, you must make that content accessible to them.

 Benefits (Why should I create captions?)

Accessible content has many benefits to you the media owner and to viewers in general. Beyond following the law, it means being a good neighbor.  Accessible content:

  • Is necessary for the deaf or hard of hearing to understand what is happening in an audio or video file.  Federal (section 508) and Illinois State Law (IITAA) as well as campus regulations require that multimedia and web content be accessible to all users.   
  • Is useful to those with learning disabilities. Many students find it easier to focus if written words accompany the media.
  • Can make it easier to understand someone with an accent.
  • Enables non-native English speakers to better understand what is being said and follow along.
  • Helps certain learners process information more effectively.
  • Enables viewing of content in a loud space or if one lacks headphones.
  • Is easier to reuse in later semesters, saving instructors time.
  • Is more easily found. As a content creator you will have more views if your content is captioned because the content is searchable. Really, it's a great feature in Kaltura. 
    • Take a moment and see for yourself. Go to https://mediaspace.illinois.edu/
      • In the search bar that says Search All Media, type "MOOC." The initial results returned are all instances where MOOC appears in metadata. 
      • In the results window select the Search in Video tab. You are now seeing every instance where the word MOOC appears in a caption file. Click the blue time code on one of the videos, it will take you to that exact point in the media entry. All possible because of the associated caption file. It also works with any language captions. Your content will be more readily viewable and searchable to everyone because of captions.
    • One more example, go to this video example. In the Search in Video box below the video, type "Kaltura" or "video." The results returned will give you links to points in this video where these words were used. Viewers can find and review key points in your media through captioning. 

This entry does not seek to offer expertise in captioning. The Division of Disability Resources and Educational Services (DRES) is the definitive campus source of information and expertise on accessibility. The tools described below are in support of the efforts of DRES. Students that require accessibility services and human based captioning (for best accuracy) should work with DRES. 

Captioning in Kaltura

Automatic Speech Recognition (ASR) captioning in produces a text file is then associates it with the media in Kaltura. Captions are time coded to specific points in a video or audio file. 

ASR is 70%-90% accurate, based on various factors. That may sound like a passing grade, but it isn't. Even a recording of a speaker with perfect diction and fidelity in the recording will need some editing.  Technical terms, acronyms, proper names, and both common and uncommon words may not appear as you expect them to. For example, early tests returned "Amino acids" as "I mean no acids."  Also, punctuation will not be added, aside from where ACR thinks a pause is long enough to merit a period.  You will need to review and edit captions for media you own. 

The process of using ASR for captioning has 2 components: Requesting the captions and Editing the captions.

The following assumes you have used Kaltura before.

[To the top.]

 Request captions

At this time only the owner of a media asset can request captions for video/audio entry. Because the captions must be edited and reviewed for accuracy, we want to ensure that media owners are aware that captioning was requested. 

To request captions

  • Log in (to MediaSpace/Compass2g/Moodle) and go to MyMedia or to the media entry directly. 
  • Go to the video/audio file for which you wish to request captions and go to that video's entry page
  • Under the video entry click on the Actions button and choose +Order Captions from the drop-down menu
  • Order Captions Menu 
  • Then click the Order Captions button
  •   Order Captions
  • After a few seconds a note will appear telling you the caption request was sent. Captions requested

At this point the media file is automatically sent to an external provider that will process the audio and identify speech.  Processing time will typically be within twice the length of the video.  However, depending on server load, it can take as long as 48 hours.   
The caption file will be returned and added to your media asset automatically.  
You will not receive notification that the captioning is complete, you will need to check for the file yourself. 

Once the captions are available, you will see a CC button in the Menu Bar. 
Closed Captions

Kaltura Interactive Transcript Widget:  If the captions were ordered through the Cielo service in Kaltura, the transcript will appear in a window below the video player, with the current sentence highlighted.  

Transcript displayed below the video.

The window can be hidden, and it includes a form to search the transcript, and a link to download the transcript. Note: this feature is only available if the captions were ordered through Kaltura (not if an .srt file was created outside Kaltura).

 Edit the Captions

Once the ASR captions are returned, they must be reviewed and edited.  

(Editing the captions can be done locally by downloading the .srt file and using a desktop editor if you are an advanced user. For the majority of Illinois medias owners, you will want to use the editor in Kaltura.) 

Good news: while only an asset owner can request captions, anyone identified as a "co-editor" for the media asset is allowed to edit the captions online. This is useful if you want to have TAs, students, colleagues, or employees edit the captions for you. For information on adding netids as co-editors, see Kaltura, Adding collaborators .

Accessing the editor:

  1. Go back to the media asset and click on the Actions button again and this time choose +Caption Requests from from the drop-down menu.
     Choose Caption Requests from the Edit menu.
  2. A list of requests for captions will appear. Please note, if you send the media file out for captions more than once, you will see multiple requests here.  Edit
    We recommend you only send one request per media file.
  3. The "EDIT" button is a link to edit the caption file.  Click on it and a new browser window will open, taking you to an online editor at cielo24.
  4. The cielo24 editor will present you with an interface that shows the media asset and the captions broken into time blocks. cielo24 editor

Using the online caption editor

By default, the playback editor is set to play audio start to end. Below the editor you can click Play Until End of Sentence and then click Save. This will play the section displayed in the yellow text box only, until you tell the editor to proceed. Find the process that works best for your needs.

The editor displays word sections by "sentence". To the editor a sentence is defined when a period is present, so adding a period to text string will create a new line. This is a little disconcerting at first but makes sense as you go on. 

Punctuation such as commas will not be added by the ASR tool, you will need to add it yourself. 

To edit a sentence, hit play. Follow along with the spoken word and the captioned text. When you need to edit a word that is in error, add a word, or add a comma, click in the yellow box. The video will stop playing while you type in the yellow edit box. Once you have stopped typing, the media file will resume after 2 seconds. (You can lengthen the amount of time the video stops under Settings in the upper right corner.)

If you remove a period that is in the wrong place, note that the text from the next section will move up to the previous sentence.

You may also manually stop the media player if you wish to pause or make extensive edits. 

If you wish to jump to a different section of text, go to the left column and select that section. The yellow highlighted segment is the portion active in the editor.


If you wish to Save your progress as you go (highly recommended) or if you need to come back to the edit later, click Save. This will keep a working version of your edits online. 
Save and Approve

When you are done with the editing, you will want to click Save and then click Approve. When you click Approve, you are sending the new version of the captions out to be synchronized with the media asset on our system. Only click Approve when you are ready to post the updated captions.


Once you have Approved the job in the pop up window, you will receive a notice that it has been submitted. At this point you will need to Close the browser tab/window. That is the only way to exit the editor at this point.

When the job has finished processing, it will automatically replace the older caption file and be associated with the media asset. The file can be edited again if further changes are needed.

 Important hints and tips

Please note the following hints and tips. Reading and understanding these may save you time and frustrations later.
  • If a caption file already exists for a media asset, we suggest that you do NOT request the ASR captioning or that you save a copy of the original caption(s). Because the caption request is automated, the new file will take the place of the old, and we cannot recover the older file. This is another reason why only an asset owner can request the ASR captioning.
  • Make sure you make any edits or trims to the video BEFORE you request captioning, or AFTER you have edited the captions. 
    • If you attempt to trim an asset after it has been captioned you will receive an error message telling you that you will no longer be able to edit the caption file online.  The is because it will automatically trim the caption file to stay in sync with the video, but afterwards the caption file will be be treated as an .srt file that you uploaded, not one that you ordered through Klatura. 
    • Similarly, if you are going to replace a video, make sure it has exactly the starting and ending points. The captions are time based, and any changes to the video will render the captions incorrect.  Any changes to timing will render the timed captions incorrect and you will need to re-order and edit the captions again.
    • There are some tools to correct time codes in a caption file. If you must make a critical edit and need to edit the captions later, contact the Kaltura team via techsvc-kaltura-help@illinois.edu and we will try to assist you.
  • When you are in the online editor, we recommend that you not make changes to the time codes unless it is critical and you know what you are doing.
  • When you edit a caption for the first time (or two), we suggest that you practice on a private video and not one other viewers may see. This takes some of the pressure off you as you learn the system.
  • If you mess up the edit of the captions (you remove too much, you alter the times), don't panic. You can always go back an request ASR captioning again and a new file will be available for editing. On a long media file you may lose some work, but you can always start over by requesting a new caption file.
  • At this time ASR only works in English.  Multiple languages can be manually associated with a single media asset.
  • At this time only ASR is available via the Kaltura interface. If you need a human to caption a file, you can contact DRES.
  • You can only request captions for a video hosted in Kaltura, e.g., not one linked from YouTube.  
  • If you need to take an English caption file and translate it to another language, contact the Kaltura team via techsvc-kaltura-help@illinois.edu and we will try to assist you with some suggestions.
  • You can ignore the color and speaker tools in the editor, our captioning solution does not accommodate these at the moment.

Best practices and recommendations to create better captions

Effective and efficient captioning is a practiced skill and this service does not pretend to replace our campus experts at DRES. This service is provided to the campus in order to provide more accessibility for content that otherwise would not be captioned by a person. The ASR tool does not record everything verbatim, it drops ummms and uhhhs for example. It also does not provide important punctuation cues. You, the editor, can take some small steps that will greatly enhance the experience for someone using the captions.

  • If there is a period of silence or music, don't leave the captions blank. Add a text that says [MUSIC] or [SILENCE] so the reader knows nothing is being missed.
  • If you cannot understand what was said enter [UNKNOWN] or [INAUDIBLE]
  • The online editor has some useful shortcuts that will enter these common entries quickly. Please use them. 
    • Sound shortcuts
  • If more than one speaker is present in the media, particularly if there is a back and forth discussion, identify the speaker when the person changes. For example:
    • Dialogue or conversation
      • Professor X: The past: a new and uncertain world. A world of endless possibilities and infinite outcomes. 
      • Peter: With great power comes great responsibility.
    • Question in the middle of a lecture with one main speaker:
      • Student: Will this be on the test?
  • Captions should preserve and identify slang or accents.
  • Do not correct errors in what was said. The captions should reflect exactly what is said, and not correct a misspoken phrase or word.
  • "Neutral" accents will result in better captions, as will enunciating clearly. Mumbled words will not convert well.
  • Better than average audio sources in a quiet room will result in better captions than ASR from a recording in a noisy space. Media with music and sound effects will not convert well either.
  • A camcorder at the back of a loud classroom using the built in mic will result in very poor ASR results. Use a mic attached to the speaker or at least one in very close proximity. 
  • Practice, practice, practice.

To the top.]

Keywords:captions, kaltura, REACH, mediaspace, accessibility, speech recognition, ada, captioning, transcript, subtitle, cielo24, srt, transcribe   Doc ID:72201
Owner:Alan B.Group:University of Illinois Technology Services
Created:2017-03-30 15:16 CDTUpdated:2018-08-06 16:46 CDT
Sites:University of Illinois Technology Services
Feedback:  6   0