Mashinali o‘qitish algoritmlari asosida o‘zbek tili matnlaridagi imlo xatolarini aniqlash va tuzatish

M.M. Ochilov; O.O. Narzullaev; O.A. Kholmatov

doi:10.62132/ijdt.v8i1.235

PDF (Русский)

Published: Apr 6, 2025

DOI: https://doi.org/10.62132/ijdt.v8i1.235

Keywords:

Uzbek language, spelling correction, natural language processing (NLP), Levenshtein distance, language model, KenLM, LSTM, BiLSTM, neural networks, contextual analysis, machine learning, agglutinative languages, spell checking, deep learning, statistical models

M.M. Ochilov

Tashkent University of information technologies named after Muhammad al-Khwarizmi

O.O. Narzullaev

Tashkent University of information technologies named after Muhammad al-Khwarizmi

O.A. Kholmatov

Tashkent University of information technologies named after Muhammad al-Khwarizmi

Abstract

This study addresses the problem of detecting and correcting spelling errors in Uzbek texts. Due to the complex morphological structure and agglutinative nature of the Uzbek language, traditional spell-checking methods do not provide sufficient accuracy. Therefore, this research employs the Levenshtein distance algorithm to measure word similarity and utilizes neural network-based language models for contextual correction. KenLM (a statistical language model), LSTM (Long Short-Term Memory), and BiLSTM (Bidirectional LSTM) approaches were used as language models. A text corpus of 80 million words was collected and analyzed for model training. The test results indicate that the BiLSTM model achieved the highest accuracy (90.09%) in correcting spelling errors, while the LSTM model recorded 84.62% accuracy. The KenLM model demonstrated an accuracy of 62.21% as well. These findings highlight that deep learning models capable of contextual analysis can significantly improve the automatic detection and correction of spelling errors in the Uzbek language. Based on the study results, future research plans include the application of transformer models, the expansion of annotated corpora, and the development of models that consider various morphological characteristics of the Uzbek language.

How to Cite

Ochilov, M., Narzullaev, O., & Kholmatov , O. (2025). Detection and correction of spelling errors in Uzbek texts based on machine learning algorithms. INTERNATIONAL JOURNAL OF THEORETICAL AND APPLIED ISSUES OF DIGITAL TECHNOLOGIES, 8(1), 85–94. https://doi.org/10.62132/ijdt.v8i1.235

Issue

Vol. 8 No. 1 (2025): International journal of theoretical and applied issues of digital technologies

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

References

Norvig, P. (2007). How to write a spelling corrector.

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805. Retrieved from https://arxiv.org/abs/1810.04805

Eryiğit, G. (2014). The impact of morphology in named entity recognition: Detecting mentions of people, locations and organizations in Turkish. Turkish Journal of Electrical Engineering & Computer Sciences, 22(6), 1356–1371.

Abduazimov, D., Mamatov, A., & Usmonov, U. (2020). Development of an annotated corpus for Uzbek language processing. In 2020 International Conference on Artificial Intelligence and Data Engineering (AIDE) (pp. 45–50). IEEE.

Mukhamadiyev, A.; Mukhiddinov, M.; Khujayarov, I.; Ochilov, M.; Cho, J. Development of Language Models for Continuous Uzbek Speech Recognition System. Sensors 2023, 23, 1145. https://doi.org/10.3390/s23031145.

Musaev, M., Khujayarov, I., Ochilov, M. (2023). Speech Recognition Technologies Based on Artificial Intelligence Algorithms. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_6.

Abdullaeva, M.I., Juraev, D.B., Ochilov, M.M., Rakhimov, M.F. (2023). Uzbek Speech Synthesis Using Deep Learning Algorithms. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_5.

Musaev, M., Mussakhojayeva, S., Khujayorov, I., Khassanov, Y., Ochilov, M., Atakan Varol, H. (2021). USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_40.

https://lex.uz/

Butayev Sh. English-uzbek uzbek-english dictionary 80 000 words. “O‘qituvchi” nashriyot-maanba ijodiy uyi. Toshkent – 2013.

https://pypi.org/project/python-Levenshtein/

https://github.com/kpu/kenlm

Article Sidebar

Main Article Content

Abstract

Article Details

References

Most read articles by the same author(s)