The Project Lingua Franca Aims at Building a rich reservoir of Gurmukhi.
Punjabi Speech to Text
Among all Indic languages, the most worked upon languages in the sector of speech recognition include Hindi, Bengali, etc. Punjabi is found in the lower ranks of this list. Moreover, the speech corpora of these languages can be obtained easily on public domain which is again not the case for Punjabi. Taking this into consideration, a Speech Data Collection app was introduced by Sabudh which allows volunteers to contribute towards building a rich resource for the same by recording short utterances of Punjabi literature.
The aim behind creating this platform is to create a publicly available “speech to text” corpus to fuel research and development of Automatic Speech Recognition (ASR) models for the Punjabi/Gurmukhi language. The web application can be accessed at panjabi.ai
Along with Data acquisition, state-of-the-art methods are being brought into the application for obtaining an optimal Speech to Text pipeline specifically for Punjabi.