Skip to main content

19.04.2024 | Noise, Vibration and Harshness

In-Vehicle Environment Noise Speech Enhancement Using Lightweight Wave-U-Net

verfasst von: Byung Ha Kang, Hyun Jun Park, Sung Hee Lee, Yeon Kyu Choi, Myoung Ok Lee, Sung Won Han

Erschienen in: International Journal of Automotive Technology

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the rapid advancement of AI technology, speech recognition has also advanced quickly. In recent years, speech-related technologies have been widely implemented in the automotive industry. However, in-vehicle environment noise inhibits the recognition rate, resulting in poor speech recognition performance. Numerous speech enhancement methods have been proposed to mitigate this performance degradation. Filter-based methodologies have been used to remove existing vehicle environment noise; however, they remove only limited noise. In addition, there is the constraint that there are limits to the size of models that can be mounted inside a vehicle. Therefore, making the model lighter while increasing speech quality in a vehicle environment is an essential factor. This study proposes a Wave-U-Net with a depthwise-separable convolution to overcome these limitations. We built various convolutional blocks using the Wave-U-Net model as a baseline to analyze the results, and we designed the network by adding squeeze-and-excitation network to improve performance without significantly increasing the parameters. The experimental results show how much noise is lost through spectrogram visualization, and that the proposed model improves performance in eliminating noise compared with conventional methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information. 

Order your 30-days-trial for free and without any commitment.

Weitere Produktempfehlungen anzeigen
Literatur
Zurück zum Zitat Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251–1258). Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251–1258).
Zurück zum Zitat Das, N., Chakraborty, S., Chaki, J., Padhy, N., & Dey, N. (2021). Fundamentals, present and future perspectives of speech enhancement. International Journal of Speech Technology, 24(4), 883–901.CrossRef Das, N., Chakraborty, S., Chaki, J., Padhy, N., & Dey, N. (2021). Fundamentals, present and future perspectives of speech enhancement. International Journal of Speech Technology, 24(4), 883–901.CrossRef
Zurück zum Zitat Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). Language modeling with gated convolutional networks. In International conference on machine learning (pp. 933–941). PMLR. Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). Language modeling with gated convolutional networks. In International conference on machine learning (pp. 933–941). PMLR.
Zurück zum Zitat Gellatly, A. W., & Dingus, T. A. (1998). Speech recognition and automotive applications: using speech to perform in-vehicle tasks. In Proceedings of the human factors and ergonomics society annual meeting (Vol. 42, no. 17, pp. 1247–1251). SAGE Publications Gellatly, A. W., & Dingus, T. A. (1998). Speech recognition and automotive applications: using speech to perform in-vehicle tasks. In Proceedings of the human factors and ergonomics society annual meeting (Vol. 42, no. 17, pp. 1247–1251). SAGE Publications
Zurück zum Zitat Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial networks. arXiv preprint arXiv:1406.2661 Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial networks. arXiv preprint arXiv:​1406.​2661
Zurück zum Zitat Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:​1704.​04861
Zurück zum Zitat Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141). Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132–7141).
Zurück zum Zitat Hu, Y., & Loizou, P. C. (2007). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.CrossRef Hu, Y., & Loizou, P. C. (2007). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.CrossRef
Zurück zum Zitat Lim, J., & Oppenheim, A. (1978). All-pole modeling of degraded speech. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(3), 197–210.CrossRef Lim, J., & Oppenheim, A. (1978). All-pole modeling of degraded speech. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(3), 197–210.CrossRef
Zurück zum Zitat Loizou, P. C. (2007). Speech enhancement: Theory and practice. CRC Press.CrossRef Loizou, P. C. (2007). Speech enhancement: Theory and practice. CRC Press.CrossRef
Zurück zum Zitat Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In Proc. icml (Vol. 30, No. 1, p. 3). Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In Proc. icml (Vol. 30, No. 1, p. 3).
Zurück zum Zitat Martin, R. (1994). Spectral subtraction based on minimum statistics. Power, 6(8), 1182–1185. Martin, R. (1994). Spectral subtraction based on minimum statistics. Power, 6(8), 1182–1185.
Zurück zum Zitat Paliwal, K., Wójcicki, K., & Shannon, B. (2011). The importance of phase in speech enhancement. Speech Communication, 53(4), 465–494.CrossRef Paliwal, K., Wójcicki, K., & Shannon, B. (2011). The importance of phase in speech enhancement. Speech Communication, 53(4), 465–494.CrossRef
Zurück zum Zitat Pascual, S., Bonafonte, A., & Serra, J. (2017). SEGAN: speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452 Pascual, S., Bonafonte, A., & Serra, J. (2017). SEGAN: speech enhancement generative adversarial network. arXiv preprint arXiv:​1703.​09452
Zurück zum Zitat Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 01CH37221) (Vol. 2, pp. 749–752). IEEE. Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 01CH37221) (Vol. 2, pp. 749–752). IEEE.
Zurück zum Zitat Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention (pp. 234–241). Springer Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention (pp. 234–241). Springer
Zurück zum Zitat Scalart, P. (1996). Speech enhancement based on a priori signal to noise estimation. In 1996 IEEE international conference on acoustics, speech, and signal processing conference proceedings (Vol. 2, pp. 629–632). IEEE. Scalart, P. (1996). Speech enhancement based on a priori signal to noise estimation. In 1996 IEEE international conference on acoustics, speech, and signal processing conference proceedings (Vol. 2, pp. 629–632). IEEE.
Zurück zum Zitat Stoller, D., Ewert, S., & Dixon, S. (2018). Wave-u-net: A multi-scale neural network for end-to-end audio source separation. arXiv preprint arXiv:1806.03185 Stoller, D., Ewert, S., & Dixon, S. (2018). Wave-u-net: A multi-scale neural network for end-to-end audio source separation. arXiv preprint arXiv:​1806.​03185
Zurück zum Zitat Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9). Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9).
Zurück zum Zitat Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen J. (2010). A short-time objective intelligibility measure for time-frequency weighted noisy speech. In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 4214–4217). IEEE. Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen J. (2010). A short-time objective intelligibility measure for time-frequency weighted noisy speech. In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 4214–4217). IEEE.
Zurück zum Zitat Wang, D., & Lim, J. (1982). The unimportance of phase in speech enhancement. IEEE Transactions on Acoustics, Speech, and Signal Processing, 30(4), 679–681.CrossRef Wang, D., & Lim, J. (1982). The unimportance of phase in speech enhancement. IEEE Transactions on Acoustics, Speech, and Signal Processing, 30(4), 679–681.CrossRef
Zurück zum Zitat Wang, Y., & Wang, D. (2015). A deep neural network for time-domain signal reconstruction. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4390–4394). IEEE. Wang, Y., & Wang, D. (2015). A deep neural network for time-domain signal reconstruction. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4390–4394). IEEE.
Zurück zum Zitat Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2014). A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 7–19.CrossRef Xu, Y., Du, J., Dai, L. R., & Lee, C. H. (2014). A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 7–19.CrossRef
Metadaten
Titel
In-Vehicle Environment Noise Speech Enhancement Using Lightweight Wave-U-Net
verfasst von
Byung Ha Kang
Hyun Jun Park
Sung Hee Lee
Yeon Kyu Choi
Myoung Ok Lee
Sung Won Han
Publikationsdatum
19.04.2024
Verlag
The Korean Society of Automotive Engineers
Erschienen in
International Journal of Automotive Technology
Print ISSN: 1229-9138
Elektronische ISSN: 1976-3832
DOI
https://doi.org/10.1007/s12239-024-00078-8

    Premium Partner