Метод извлечения разреженных и энергоэффективных признаков на основе Delta-Gated Spike Encoding (DGSE)

P.B. Nurimov; O.J. Babomuradov

doi:10.62132/ijdt.v9i2.379

PDF (Русский)

Published: May 15, 2026

DOI: https://doi.org/10.62132/ijdt.v9i2.379

Keywords:

speaker recognition, feature extraction, log-Mel, spike encoding, sparsity, neuromorphic computing, VoxCeleb1, DGSE

P.B. Nurimov

National Research University “Tashkent Institute of Irrigation and Agricultural Mechanization Engineers”

O.J. Babomuradov

Jizzakh Branch of Kazan Federal University

Abstract

This paper proposes a new feature extraction method for speaker recognition, called Delta-Gated Spike Encoding (DGSE). The proposed approach combines log-Mel spectrograms, temporal delta, adaptive thresholding, an energy gate, and positive/negative spike encoding stages. The goal is to extract informative time-frequency changes from the acoustic signal and form a sparse representation suitable for subsequent spiking or energy-efficient models. Experiments were conducted on the VoxCeleb1 dataset using a three-stage parametric search procedure. The results of the initial coarse search, the subsequent extended search, and the final fine search were compared. The best result was obtained with the parameters alpha = 1.0, beta = 0.05, and energy_thr = -5.25. Under these settings, the method achieved total spike rate = 0.079585, sparsity = 0.920415, and gate open rate = 0.718533. The obtained results show that the DGSE method preserves informative parts of the signal while ensuring high sparsity. This makes it a promising solution for resource-constrained devices and CNN-SNN hybrid speaker recognition systems.

How to Cite

Nurimov, P., & Babomuradov, O. (2026). A method for extracting sparse and energy-efficient feature sets based on Delta-Gated Spike Encoding (DGSE). INTERNATIONAL JOURNAL OF THEORETICAL AND APPLIED ISSUES OF DIGITAL TECHNOLOGIES, 9(2), 86–91. https://doi.org/10.62132/ijdt.v9i2.379

Issue

Vol. 9 No. 2 (2026): International journal of theoretical and applied issues of digital technologies

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

References

Davis S., Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences // IEEE Transactions on Acoustics, Speech, and Signal Processing. 1980. Vol. 28, No. 4. P. 357–366.

Furui S. Speaker-independent isolated word recognition using dynamic features of speech spectrum // IEEE Transactions on Acoustics, Speech, and Signal Processing. 1986. Vol. 34, No. 1. P. 52–59.

Furui S. Recent advances in speaker recognition // Pattern Recognition Letters. 1997. Vol. 18, No. 9. P. 859–872.

Nagrani A., Chung J. S., Zisserman A. VoxCeleb: A large-scale speaker identification dataset // Proc. Interspeech. 2017. P. 2616–2620.

Chung J. S., Nagrani A., Zisserman A. VoxCeleb2: Deep speaker recognition // Proc. Interspeech. 2018. P. 1086–1090.

Snyder D., Garcia-Romero D., Sell G., Povey D., Khudanpur S. X-vectors: Robust DNN embeddings for speaker recognition // Proc. ICASSP. 2018. P. 5329–5333.

Desplanques B., Thienpondt J., Demuynck K. ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification // Proc. Interspeech. 2020. P. 3830–3834.

Nagrani A., Chung J. S., Xie W., Zisserman A. VoxCeleb: Large-scale speaker verification in the wild // Computer Speech & Language. 2020. Vol. 60. Article 101027.

Rathi N. et al. Exploring neuromorphic computing based on spiking neural networks: Algorithms to hardware // ACM Computing Surveys. 2023. Vol. 55, No. 12. P. 1–49.

O’Shaughnessy D. Review of methods for automatic speaker verification // IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2024. Vol. 32. P. 172–198.

Article Sidebar

Main Article Content

Abstract

Article Details

References

Most read articles by the same author(s)