
Thursday Jul 04, 2024
Ep 2: ASR models, accuracy, cost & the role of humans - Aleks Smechov from Wordcab
In this conversation, Derick Thompson from Salad Technologies interviews Alex from WordCab about transcription, ASR, and accessibility. They discuss the importance of accurate transcripts for global accessibility, the different definitions of verbatim transcription, and the impact of audio cues. They also talk about the best ASR models, tools for post-processing, and the need for human editors in transcription. The conversation concludes with a discussion on the future of ASR and transcription.
Takeaways
- Accurate transcripts are crucial for global accessibility, allowing people with disabilities to understand audio and video content.
- Different definitions of verbatim transcription exist, ranging from including all disfluencies to a more cleaned-up version.
- Audio cues, such as laughter or coughing, are important for accessibility and may need to be added during transcription.
- The best ASR models for transcription depend on the specific use case and language requirements.
- Post-processing is essential for improving transcript accuracy, especially for industry-specific terms and difficult words.
- Human editors play a vital role in fine-tuning transcripts and adding value through post-processing and audio cues.
- The future of ASR and transcription lies in increasing accuracy, reducing word error rates, and focusing on post-processing capabilities.
- Transcription will become a commodity, and the real value will come from what can be done with the transcript after transcription.
- Using cost-effective GPU instances and cloud-agnostic tools is important for hosting ASR models.
- The goal is to provide reliable and affordable transcription services to meet the needs of different use cases.
Sound Bites
- "Accessibility in terms of video and audio, captions and transcription in general, is making sure that people who have some sort of disability, maybe they're hard of hearing or deaf, are still able to understand the captions or subtitles or transcript as well as someone who could hear."
- "Transcript editing will always be there as a kind of a last mile thing for edge cases and there will always be edge cases."
- "Transcription will become a commodity or table stakes like, you'll have to have excellent transcription, 95% accuracy, et cetera, in the future. And the real value will come in with what you could do after."
Chapters
00:00: Introduction and Overview of WordCab
01:14: Defining Verbatim Transcription and Audio Cues
07:03: Choosing the Best ASR Models for Transcription
09:26: The Importance of Post-Processing in Transcription
12:51: Accuracy, Word Error Rate, and Transcription
14:17: Tools and Approaches for ASR and Transcription
19:43: The Future of ASR and Transcription
21:08: Optimizing ASR Performance and Cost
22:07: Providing Reliable and Affordable Transcription Services
Version: 20241125
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.