Comparing phoneme recognition systems on the detection and diagnosis of reading mistakes for young children's oral reading evaluation (Lalilo)


From the abstract; "In the scope of our oral reading exercise for 5-8-year-old children, models need to be able to precisely detect and diagnose reading mistakes, which remains a considerable challenge even for state-of-the-art ASR systems. In this paper, we compare hybrid and end-to-end acoustic models trained for phoneme recognition on young learners' speech. We evaluate them not only with phoneme error rates but through detailed phoneme-level misread detection and diagnostic metrics. We show that a traditional TDNNF-HMM model, despite a high PER, is the best at detecting reading mistakes (F1-score 72.6%), but at the cost of a low specificity (74.7%), which is pedagogically critical. A recent Transformer+CTC model, to which we applied our synthetic reading mistakes augmentation method, obtains the highest precision (81.8%) and specificity (86.3%), as well as the highest correct diagnosis rate (70.7%), showing it is the best fit for our application."

Citation: Gelin, L., Daniel, M., Pellegrini, T., & Pinquier, J. (2023). Comparing phoneme recognition systems on the detection and diagnosis of reading mistakes for young children's oral reading evaluation. SLaTE 2023-INTERSPEECH 2023 Satellite workshop on Speech and Language Technology in Education, Trinity College Dublin, Ireland-August 18-20, 2023.

Publication Date:

For information about research not available electronically, please email us at