YouTube has launched automatic sound effect captioning for the first time.
For now, the system can just show three classes of sounds: Applause, music and laughter. “These were among the most frequent manually captioned sounds, and they can add meaningful context for viewers who are deaf and hard of hearing,” the company wrote.As with the automatic captions, Google uses machine learning to pick out sounds and display them as text.
It developed a “deep neural network (DNN)” model for ambient sound, and trained it with “thousands of hours of videos” to get the best results. The toughest part, it wrote in a technical blog, was separating and displaying events that tend to occur at the same, like laughter and applause.
YouTube’s team said its aware that the captions are “simplistic,” but adding features will be easier as it has built a solid back end foundation.
[Source]