nach oben

International Journal of Automotive Technology

19.04.2024 | Noise, Vibration and Harshness

In-Vehicle Environment Noise Speech Enhancement Using Lightweight Wave-U-Net

verfasst von: Byung Ha Kang, Hyun Jun Park, Sung Hee Lee, Yeon Kyu Choi, Myoung Ok Lee, Sung Won Han

Erschienen in: International Journal of Automotive Technology

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

With the rapid advancement of AI technology, speech recognition has also advanced quickly. In recent years, speech-related technologies have been widely implemented in the automotive industry. However, in-vehicle environment noise inhibits the recognition rate, resulting in poor speech recognition performance. Numerous speech enhancement methods have been proposed to mitigate this performance degradation. Filter-based methodologies have been used to remove existing vehicle environment noise; however, they remove only limited noise. In addition, there is the constraint that there are limits to the size of models that can be mounted inside a vehicle. Therefore, making the model lighter while increasing speech quality in a vehicle environment is an essential factor. This study proposes a Wave-U-Net with a depthwise-separable convolution to overcome these limitations. We built various convolutional blocks using the Wave-U-Net model as a baseline to analyze the results, and we designed the network by adding squeeze-and-excitation network to improve performance without significantly increasing the parameters. The experimental results show how much noise is lost through spectrogram visualization, and that the proposed model improves performance in eliminating noise compared with conventional methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

Jetzt informieren

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

Jetzt informieren

AI Hub. (2020). https://aihub.or.kr/

Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251–1258).

Das, N., Chakraborty, S., Chaki, J., Padhy, N., & Dey, N. (2021). Fundamentals, present and future perspectives of speech enhancement. International Journal of Speech Technology, 24(4), 883–901.CrossRef

Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). Language modeling with gated convolutional networks. In International conference on machine learning (pp. 933–941). PMLR.

Gellatly, A. W., & Dingus, T. A. (1998). Speech recognition and automotive applications: using speech to perform in-vehicle tasks. In Proceedings of the human factors and ergonomics society annual meeting (Vol. 42, no. 17, pp. 1247–1251). SAGE Publications

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial networks. arXiv preprint arXiv:1406.2661

Hendrycks, D., & Gimpel, K. (2016). Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141).

Hu, Y., & Loizou, P. C. (2007). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.CrossRef

Kingma, D.P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

Lim, J., & Oppenheim, A. (1978). All-pole modeling of degraded speech. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(3), 197–210.CrossRef

Loizou, P. C. (2007). Speech enhancement: Theory and practice. CRC Press.CrossRef

Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In Proc. icml (Vol. 30, No. 1, p. 3).

Macartney, C., & Weyde, T. (2018). Improved speech enhancement with the wave-u-net. arXiv preprint arXiv:1811.11307.

Martin, R. (1994). Spectral subtraction based on minimum statistics. Power, 6(8), 1182–1185.

Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784

Paliwal, K., Wójcicki, K., & Shannon, B. (2011). The importance of phase in speech enhancement. Speech Communication, 53(4), 465–494.CrossRef

Pascual, S., Bonafonte, A., & Serra, J. (2017). SEGAN: speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452

Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 01CH37221) (Vol. 2, pp. 749–752). IEEE.

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention (pp. 234–241). Springer

Scalart, P. (1996). Speech enhancement based on a priori signal to noise estimation. In 1996 IEEE international conference on acoustics, speech, and signal processing conference proceedings (Vol. 2, pp. 629–632). IEEE.

Stoller, D., Ewert, S., & Dixon, S. (2018). Wave-u-net: A multi-scale neural network for end-to-end audio source separation. arXiv preprint arXiv:1806.03185

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9).

Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen J. (2010). A short-time objective intelligibility measure for time-frequency weighted noisy speech. In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 4214–4217). IEEE.

Wang, D., & Lim, J. (1982). The unimportance of phase in speech enhancement. IEEE Transactions on Acoustics, Speech, and Signal Processing, 30(4), 679–681.CrossRef

Wang, Y., & Wang, D. (2015). A deep neural network for time-domain signal reconstruction. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4390–4394). IEEE.

Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2014). A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 7–19.CrossRef

Titel: In-Vehicle Environment Noise Speech Enhancement Using Lightweight Wave-U-Net
verfasst von: Byung Ha Kang
Hyun Jun Park
Sung Hee Lee
Yeon Kyu Choi
Myoung Ok Lee
Sung Won Han
Publikationsdatum: 19.04.2024
Verlag: The Korean Society of Automotive Engineers
Erschienen in: International Journal of Automotive Technology
Print ISSN: 1229-9138
Elektronische ISSN: 1976-3832
DOI: https://doi.org/10.1007/s12239-024-00078-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

ATZelectronics worldwide

ATZelektronik

Premium Partner