Mashinali o‘qitish algoritmlari asosida o‘zbek tili matnlaridagi imlo xatolarini aniqlash va tuzatish
Основное содержимое статьи
Аннотация
Ushbu tadqiqotda o‘zbek tili matnlaridagi imlo xatolarni aniqlash va tuzatish muammosi ko‘rib chiqiladi. O‘zbek tilining murakkab morfologik tuzilishi va agglutinativ xususiyatlari tufayli an’anaviy imlo xatolarni tekshirish usullari yetarli natija bermaydi. Shuning uchun ushbu ishda Levenshteyn masofasi algoritmi yordamida so‘z o‘xshashligini hisoblash va neyron tarmoqlarga asoslangan til modellari orqali kontekstual tuzatish mexanizmlari ishlab chiqildi. Til modeli sifatida KenLM (statistik til modeli), LSTM (Long Short-Term Memory) va BiLSTM (Bidirectional LSTM) yondashuvlari qo‘llanildi. Modelni o‘qitish uchun 80 million so‘zdan iborat matn korpusi yig‘ildi va tahlil qilindi. Test natijalari shuni ko‘rsatdiki, BiLSTM modeli imlov xatolarni tuzatishda eng yuqori samaradorlikni (90.09%) ta’minladi, LSTM modeli esa 84.62% natijani qayd etdi. KenLM modelidan foydalangan holda esa samaradorlik 62.31% ni tashkil etdi.
Информация о статье

Это произведение доступно по лицензии Creative Commons «Attribution» («Атрибуция») 4.0 Всемирная.
Библиографические ссылки
Norvig, P. (2007). How to write a spelling corrector.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805. Retrieved from https://arxiv.org/abs/1810.04805
Eryiğit, G. (2014). The impact of morphology in named entity recognition: Detecting mentions of people, locations and organizations in Turkish. Turkish Journal of Electrical Engineering & Computer Sciences, 22(6), 1356–1371.
Abduazimov, D., Mamatov, A., & Usmonov, U. (2020). Development of an annotated corpus for Uzbek language processing. In 2020 International Conference on Artificial Intelligence and Data Engineering (AIDE) (pp. 45–50). IEEE.
Mukhamadiyev, A.; Mukhiddinov, M.; Khujayarov, I.; Ochilov, M.; Cho, J. Development of Language Models for Continuous Uzbek Speech Recognition System. Sensors 2023, 23, 1145. https://doi.org/10.3390/s23031145.
Musaev, M., Khujayarov, I., Ochilov, M. (2023). Speech Recognition Technologies Based on Artificial Intelligence Algorithms. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_6.
Abdullaeva, M.I., Juraev, D.B., Ochilov, M.M., Rakhimov, M.F. (2023). Uzbek Speech Synthesis Using Deep Learning Algorithms. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_5.
Musaev, M., Mussakhojayeva, S., Khujayorov, I., Khassanov, Y., Ochilov, M., Atakan Varol, H. (2021). USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_40.
Butayev Sh. English-uzbek uzbek-english dictionary 80 000 words. “O‘qituvchi” nashriyot-maanba ijodiy uyi. Toshkent – 2013.