Kaltura, Captioning Tools

This article is an overview and walk through of captioning tools in Kaltura

Automatic Speech Recognition (ASR) captioning is available via Kaltura in Blackboard. 
Accessible and searchable media is very important to the University of Illinois. We strive to make all media captioned to enable discovery and to ensure that accessibility needs are met. Captioned video is useful for those with a hearing impairment and for users with certain learning disabilities. It can also be a useful tool to anyone who struggles to understand an accent, as reading text along with listening assists with comprehension.

 Before you start, keep in mind:

  1. There are two steps in this process, ordering captions and editing them.  After you order captions, they will be added automatically and will be available to anyone viewing your video, but they will not be entirely accurate and you will need to plan on correcting them.
  2. You will not be able to trim or edit you video after adding captions, so you'll need to do this beforehand.  
  3. It would be a good idea to practice this process the first time with a short private video

Media creators are responsible for captioning their content.  Content that is not captioned may be removed from public view. If you have a student or colleague with special needs that will view the content, you must make that content accessible to them. 

 Benefits (Why should I create captions?)

Accessible content has many benefits to you the media owner and to viewers in general. Beyond following the law, it means being a good neighbor.  Accessible content:

  • Is necessary for the hearing impaired to understand what is happening in an audio or video file.  Federal (section 508) and Illinois State Law (IITAA) as well as campus regulations require that multimedia and web content be accessible to all users.   
  • Is useful to those with learning difficulties. Many students find it easier to focus if written words accompany the media.
  • Can make it easier to understand someone with an accent.
  • Enables non-native English speakers to better understand what is being said and follow along.
  • Helps certain learners process information more effectively.
  • Enables viewing of content in a loud space or if one lacks headphones.
  • Is easier to reuse in later semesters, saving instructors time.
  • Is more easily found. As a content creator you will have more views if your content is captioned because the content is searchable. Really, it's a great feature in Kaltura. 

This entry does not seek to offer expertise in captioning. 

Captioning in Kaltura

Automatic Speech Recognition (ASR) captioning in produces a text file is then associates it with the media in Kaltura. Captions are time coded to specific points in a video or audio file. 

ASR is 70%-90% accurate, based on various factors. That may sound like a passing grade, but it isn't. Even a recording of a speaker with perfect diction and fidelity in the recording will need some editing.  Technical terms, acronyms, proper names, and both common and uncommon words may not appear as you expect them to. For example, early tests returned "Amino acids" as "I mean no acids."  Also, punctuation will not be added, aside from where ACR thinks a pause is long enough to merit a period.  You will need to review and edit captions for media you own. 

The process of using ASR for captioning has 2 components: Requesting the captions and Editing the captions.

The following assumes you have used Kaltura before.

[To the top.]

 Request captions

At this time only the owner of a media asset can request captions for video/audio entry. Because the captions must be edited and reviewed for accuracy, we want to ensure that media owners are aware that captioning was requested. 

To request captions

  1. Log in Blackboard and go to MyMedia or to the media entry directly. 
  2. Go to the video/audio file for which you wish to request captions and go to that video's entry page
  3. Under the video entry click on the Actions button and choose +Order Captions from the drop-down menu
       Order Captions
  4. Then click the Order Captions button  Order Captions
  5. After a few seconds a note will appear telling you the caption request was sent. Captions requested
At this point the media file is automatically sent to an external provider that will process the audio and identify speech. In 5-30 minutes the caption file will be returned and added to your media asset automatically.  Shorter duration files will come back sooner, actual processing times will vary based on server load.
You will not receive notification that the captioning is complete, you will need to check the file yourself. 

Once the captions are available, you will see a CC button in the Menu Bar
Closed Captioning Button Location on Video Player

 Edit the Captions

Once the ASR captions are returned, they must be reviewed and edited.  

(Editing the captions can be done locally by downloading the .srt file and using a desktop editor if you are an advanced user. For the majority of Illinois medias owners, you will want to use the editor in Kaltura.) 

Good news: while only an asset owner can request captions, anyone identified as a "co-editor" for the media asset is allowed to edit the captions online. This is useful if you want to have students, colleagues, or employees edit the captions for you. For information on adding netids as co-editors please contact ITS at (217)206-6000 or techsupport@uis.edu.

Accessing the editor:

  1. Go back to the media asset and click on the Actions button again and this time choose +View Captions Request from from the drop-down menu.
     View Caption Request
  2. A list of requests for captions will appear. Please note, if you send the media file out for captions more than once, you will see multiple requests here. Caption requests 
    We recommend you only send one request per media file.  
  3. The word "COMPLETE" is a link to edit the caption file.  Click on it and a new browser window will open, taking you to an online editor at cielo24.
  4. The cielo24 editor will present you with an interface that shows the media asset and the captions broken into time blocks. cielo24 editor

Using the online caption editor

By default, the playback editor is set to play audio start to end. Below the editor you can click Play Until End of Sentence and then click Save. This will play the section displayed in the yellow text box only, until you tell the editor to proceed. Find the process that works best for your needs.

The editor displays word sections by "sentence". To the editor a sentence is defined when a period is present, so adding a period to text string will create a new line. This is a little disconcerting at first but makes sense as you go on. 

Punctuation such as commas will not be added by the ASR tool, you will need to add it yourself. 

To edit a sentence, hit play. Follow along with the spoken word and the captioned text. When you need to edit a word that is in error, add a word, or add a comma, click in the yellow box. The video will stop playing while you type in the yellow edit box. Once you have stopped typing, the media file will resume after 2 seconds. (You can lengthen the amount of time the video stops under Settings in the upper right corner.)

If you remove a period that is in the wrong place, note that the text from the next section will move up to the previous sentence.

You may also manually stop the media player if you wish to pause or make extensive edits. 

If you wish to jump to a different section of text, go to the left column and select that section. The yellow highlighted segment is the portion active in the editor.


If you wish to Save your progress as you go (highly recommended) or if you need to come back to the edit later, click Save. This will keep a working version of your edits online. 
Save and Approve

When you are done with the editing, you will want to click Save and then click Approve. When you click Approve, you are sending the new version of the captions out to be synchronized with the media asset on our system. Only click Approve when you are ready to post the updated captions.


Once you have Approved the job in the pop up window, you will receive a notice that it has been submitted. At this point you will need to Close the browser tab/window. That is the only way to exit the editor at this point.

When the job has finished processing, it will automatically replace the older caption file and be associated with the media asset. The file can be edited again if further changes are needed.

 Important hints and tips

Please note the following hints and tips. Reading and understanding these may save you time and frustrations later.
  • If a caption file already exists for a media asset, we suggest that you do NOT request the ASR captioning or that you save a copy of the original caption(s). Because the caption request is automated, the new file will take the place of the old, and we cannot recover the older file. This is another reason why only an asset owner can request the ASR captioning.
  • Make sure you make any edits or trims to the video BEFORE you request captioning. The captions are time based, and any changes to the video will render the captions incorrect.
    • If you attempt to trim an asset after it has been captioned you will receive an error message. 
    • trim message
    • Similarly, if you are going to replace a video, make sure it is exactly the same "cut". Any changes to timing will render the timed captions incorrect and you will need to re-order and edit the captions again.
  • When you are in the online editor, we recommend that you not make changes to the time codes unless it is critical and you know what you are doing.
  • When you edit a caption for the first time (or two), we suggest that you practice on a private video and not one other viewers may see. This takes some of the pressure off you as you learn the system.
  • If you mess up the edit of the captions (you remove too much, you alter the times), don't panic. You can always go back an request ASR captioning again and a new file will be available for editing. On a long media file you may lose some work, but you can always start over by requesting a new caption file.
  • At this time ASR only works in English.  Multiple languages can be manually associated with a single media asset.
  • You can ignore the color and speaker tools in the editor, our captioning solution does not accommodate these at the moment.
  • If you have been given "co-editor" rights you must filter your MyMedia results on "Media I Can Edit" for the content to show in your MyMedia.

Best practices and recommendations to create better captions

Effective and efficient captioning is a practiced skill. This service is provided to the campus in order to provide more accessibility for content that otherwise would not be captioned by a person. The ASR tool does not record everything verbatim, it drops ummms and uhhhs for example. It also does not provide important punctuation cues. You, the editor, can take some small steps that will greatly enhance the experience for someone using the captions.

  • If there is a period of silence or music, don't leave the captions blank. Add a text that says [MUSIC] or [SILENCE] so the reader knows nothing is being missed.
  • If you cannot understand what was said enter [UNKNOWN] or [INAUDIBLE]
  • The online editor has some useful shortcuts that will enter these common entries quickly. Please use them. 
    • Sound shortcuts
  • If more than one speaker is present in the media, particularly if there is a back and forth discussion, identify the speaker when the person changes. For example:
    • Dialogue or conversation
      • Professor X: The past: a new and uncertain world. A world of endless possibilities and infinite outcomes. 
      • Peter: With great power comes great responsibility.
    • Question in the middle of a lecture with one main speaker:
      • Student: Will this be on the test?
  • Captions should preserve and identify slang or accents.
  • Do not correct errors in what was said. The captions should reflect exactly what is said, and not correct a misspoken phrase or word.
  • "Neutral" accents will result in better captions, as will enunciating clearly. Mumbled words will not convert well.
  • Better than average audio sources in a quiet room will result in better captions than ASR from a recording in a noisy space. Media with music and sound effects will not convert well either.
  • A camcorder at the back of a loud classroom using the built in mic will result in very poor ASR results. Use a mic attached to the speaker or at least one in very close proximity. 
  • Practice, practice, practice.

[To the top.]

Keywords:captions, kaltura, REACH, mediaspace, accessibility, speech recognition, ada, captioning, transcript, subtitle, cielo24, srt   Doc ID:72634
Owner:Jeff S.Group:University of Illinois at Springfield
Created:2017-04-17 10:58 CDTUpdated:2017-10-02 14:05 CDT
Sites:University of Illinois at Springfield
Feedback:  0   0