They lack sounds to train their intelligence models. artificial (IA), French companies and vocal technique laboratories (“voicetech”) will launch a campaign to ask French-speakers to give a little of their voice for free, Karel Bourgois, president of the Voice Lab, told AFP.
Some thirty players in the sector have joined forces in this association to bring together their data sets, or “datasets”: thousands of hours of recorded voices, essential to feed and improve voice AI models.
“Together, we have collected 9,000 hours. But we are start-ups and SMEs that are facing giants like Microsoft or Google, who have millions of hours with YouTube. In France, ‘data sets’ are few and often unlicensed for commercial use. hence the difficulty of training AI. Recently, a young researcher spent two years simply building her data, ”lamented the entrepreneur, also founder of the Voxist start-up.
A truck to record voices
To go further, Voice Lab, in partnership with the Mozilla Foundation, will help relaunch the collection of French voices on the Common Voice site, where everyone can register by reading a text. And, in September, it will launch a campaign for a new version of this tool, “which will collect more natural voices, offering to answer questions.”
Another track, with the Human-Num laboratory, the “listen to speak” project: a truck that travels through France to record voices, more diverse than the voices of radios or televisions. Voice Lab is also in discussion with Radio France, France Télévisions and the INA, but faces legal vagueness over the notion of use for AI training purposes.
growth industry
In 2021, Voice Lab won a public call for projects and obtained 4.7 million euros over 5 years to pool voice data, create common models, expose the services of its members, for research or commercial purposes.
A booming sector, revolutionized by AI, “voicetech” includes speech recognition and synthesis, emotion analysis, speaker identification, oral transcription of texts, removal of accents or even imitations, and voice transformation, even in real time.
These techniques are of interest to the general public as well as to large groups that want to use voice as an identifier or to automate call centers. In January, Microsoft introduced VALL-E, an AI model that can imitate a voice from 3 seconds of recording.
Source: BFM TV
