A method for extracting sparse and energy-efficient feature sets based on Delta-Gated Spike Encoding (DGSE)
Main Article Content
Abstract
This paper proposes a new feature extraction method for speaker recognition, called Delta-Gated Spike Encoding (DGSE). The proposed approach combines log-Mel spectrograms, temporal delta, adaptive thresholding, an energy gate, and positive/negative spike encoding stages. The goal is to extract informative time-frequency changes from the acoustic signal and form a sparse representation suitable for subsequent spiking or energy-efficient models. Experiments were conducted on the VoxCeleb1 dataset using a three-stage parametric search procedure. The results of the initial coarse search, the subsequent extended search, and the final fine search were compared. The best result was obtained with the parameters alpha = 1.0, beta = 0.05, and energy_thr = -5.25. Under these settings, the method achieved total spike rate = 0.079585, sparsity = 0.920415, and gate open rate = 0.718533. The obtained results show that the DGSE method preserves informative parts of the signal while ensuring high sparsity. This makes it a promising solution for resource-constrained devices and CNN-SNN hybrid speaker recognition systems.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
Davis S., Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences // IEEE Transactions on Acoustics, Speech, and Signal Processing. 1980. Vol. 28, No. 4. P. 357–366.
Furui S. Speaker-independent isolated word recognition using dynamic features of speech spectrum // IEEE Transactions on Acoustics, Speech, and Signal Processing. 1986. Vol. 34, No. 1. P. 52–59.
Furui S. Recent advances in speaker recognition // Pattern Recognition Letters. 1997. Vol. 18, No. 9. P. 859–872.
Nagrani A., Chung J. S., Zisserman A. VoxCeleb: A large-scale speaker identification dataset // Proc. Interspeech. 2017. P. 2616–2620.
Chung J. S., Nagrani A., Zisserman A. VoxCeleb2: Deep speaker recognition // Proc. Interspeech. 2018. P. 1086–1090.
Snyder D., Garcia-Romero D., Sell G., Povey D., Khudanpur S. X-vectors: Robust DNN embeddings for speaker recognition // Proc. ICASSP. 2018. P. 5329–5333.
Desplanques B., Thienpondt J., Demuynck K. ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification // Proc. Interspeech. 2020. P. 3830–3834.
Nagrani A., Chung J. S., Xie W., Zisserman A. VoxCeleb: Large-scale speaker verification in the wild // Computer Speech & Language. 2020. Vol. 60. Article 101027.
Rathi N. et al. Exploring neuromorphic computing based on spiking neural networks: Algorithms to hardware // ACM Computing Surveys. 2023. Vol. 55, No. 12. P. 1–49.
O’Shaughnessy D. Review of methods for automatic speaker verification // IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2024. Vol. 32. P. 172–198.