Skip to main content
Erschienen in:
Buchtitelbild

2024 | OriginalPaper | Buchkapitel

AdaPQ: Adaptive Exploration Product Quantization with Adversary-Aware Block Size Selection Toward Compression Efficiency

verfasst von : Yan-Ting Ye, Ting-An Chen, Ming-Syan Chen

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Product Quantization (PQ) has received an increasing research attention due to the effectiveness on bit-width compression for memory efficiency. PQ is developed to divide weight values into blocks and adopt clustering algorithms dividing them into groups assigned with quantized values accordingly. Existing research mainly focused on the clustering strategy design with a minimal error between the original weights and the quantized values for the performance maintenance. However, the block division, i.e., the selection of block size, determines the choices of number of clusters and compression rate which has not been fully studied. Therefore, this paper proposes a novel scheme AdaPQ with the process, Adaptive Exploration Product Quantization, to first flexibly construct varying block sizes by padding the filter weights, which enlarges the search space of quantization result of PQ and avoids being suffered from a sub-optimal solution. Afterward, we further design a strategy, Adversary-aware Block Size Selection, to select an appropriate block size for each layer by evaluating the sensitivity on performance under perturbation for obtaining a minor performance loss under a high compression rate. Experimental results show that AdaPQ achieves a higher accuracy under a similar compression rate compared with the state-of-the-art.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3523–3542 (2022) Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3523–3542 (2022)
3.
Zurück zum Zitat Anwar, S., Barnes, N.: Real image denoising with feature attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3155–3164 (2019) Anwar, S., Barnes, N.: Real image denoising with feature attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3155–3164 (2019)
4.
Zurück zum Zitat Li, E., Zeng, L., Zhou, Z., Chen, X.: Edge AI: on-demand accelerating deep neural network inference via edge computing. IEEE Trans. Wireless Commun. 19(1), 447–457 (2019) Li, E., Zeng, L., Zhou, Z., Chen, X.: Edge AI: on-demand accelerating deep neural network inference via edge computing. IEEE Trans. Wireless Commun. 19(1), 447–457 (2019)
5.
Zurück zum Zitat Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2704–2713 (2018) Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2704–2713 (2018)
7.
Zurück zum Zitat Luo, J., Wu, J., Lin, W.: ThiNet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017) Luo, J., Wu, J., Lin, W.: ThiNet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017)
8.
Zurück zum Zitat Blalock, D., Gonzalez, O., Jose, J., Frankle, J., Guttag, J.: What is the state of neural network pruning? Proc. Mach. Learn. Syst. 2, 129–146 (2020) Blalock, D., Gonzalez, O., Jose, J., Frankle, J., Guttag, J.: What is the state of neural network pruning? Proc. Mach. Learn. Syst. 2, 129–146 (2020)
11.
Zurück zum Zitat Dong, Z., Yao, Z., Gholami, A., Mahoney, M., Keutzer, K.: HAWQ: Hessian Aware Quantization of neural networks with mixed-precision. In: Proceedings of The IEEE/CVF International Conference on Computer Vision (ICCV) 2019, pp. 293–302 (2019) Dong, Z., Yao, Z., Gholami, A., Mahoney, M., Keutzer, K.: HAWQ: Hessian Aware Quantization of neural networks with mixed-precision. In: Proceedings of The IEEE/CVF International Conference on Computer Vision (ICCV) 2019, pp. 293–302 (2019)
12.
Zurück zum Zitat Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: HAQ: Hardware-aware automated quantization with mixed precision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019, pp. 8604–8612 (2019) Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: HAQ: Hardware-aware automated quantization with mixed precision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019, pp. 8604–8612 (2019)
13.
Zurück zum Zitat Lin, X., Zhao, C., Pan, W.: Towards accurate binary convolutional neural network. In: Advances in Neural Information Processing Systems 2017, vol. 30, pp. 345–353 (2017) Lin, X., Zhao, C., Pan, W.: Towards accurate binary convolutional neural network. In: Advances in Neural Information Processing Systems 2017, vol. 30, pp. 345–353 (2017)
14.
Zurück zum Zitat Lin, M., et al.: Rotated binary neural network. In: Advances in Neural Information Processing Systems 2020, vol. 33, pp. 7474–7485 (2020) Lin, M., et al.: Rotated binary neural network. In: Advances in Neural Information Processing Systems 2020, vol. 33, pp. 7474–7485 (2020)
15.
Zurück zum Zitat Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 117–128 (2011) Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 117–128 (2011)
17.
Zurück zum Zitat Stock, P., Joulin, A., Gribonval, R., Graham, B., Jégou, H.: And the bit goes down: revisiting the quantization of neural networks. In: International Conference on Learning Representations. (2020) Stock, P., Joulin, A., Gribonval, R., Graham, B., Jégou, H.: And the bit goes down: revisiting the quantization of neural networks. In: International Conference on Learning Representations. (2020)
18.
Zurück zum Zitat Moon, T.K.: The expectation-maximization algorithm. IEEE Sig. Process. Mag. 13, 47–60 (1996) Moon, T.K.: The expectation-maximization algorithm. IEEE Sig. Process. Mag. 13, 47–60 (1996)
19.
Zurück zum Zitat Stock, P., et al.: Training with quantization noise for extreme model compression. In: International Conference on Learning Representations 2021 (2021) Stock, P., et al.: Training with quantization noise for extreme model compression. In: International Conference on Learning Representations 2021 (2021)
20.
Zurück zum Zitat Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016, pp. 4820–4828 (2016) Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016, pp. 4820–4828 (2016)
22.
Zurück zum Zitat Wu, Y., Lee, H., Lin, Y., Chien, S.: Accelerator design for vector quantized convolutional neural network. In: 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pp. 46–50 (2019) Wu, Y., Lee, H., Lin, Y., Chien, S.: Accelerator design for vector quantized convolutional neural network. In: 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pp. 46–50 (2019)
23.
Zurück zum Zitat Hinton, G., Vinyals, O., Dean, J.,: Distilling the knowledge in a neural network. In: NIPS 2014 Deep Learning Workshop (2014) Hinton, G., Vinyals, O., Dean, J.,: Distilling the knowledge in a neural network. In: NIPS 2014 Deep Learning Workshop (2014)
28.
Zurück zum Zitat Yao, Z., et al.: HAWQ-V3: dyadic neural network quantization. In: Proceedings of the 38th International Conference on Machine Learning, vol. 139, pp. 11875–11886 (2021) Yao, Z., et al.: HAWQ-V3: dyadic neural network quantization. In: Proceedings of the 38th International Conference on Machine Learning, vol. 139, pp. 11875–11886 (2021)
30.
Zurück zum Zitat Han, S., Mao, H., Dally, W.: Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. In: 4th International Conference on Learning Representations (ICLR) (2016) Han, S., Mao, H., Dally, W.: Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. In: 4th International Conference on Learning Representations (ICLR) (2016)
31.
Zurück zum Zitat MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967) MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
35.
Zurück zum Zitat Le, Y., Yang, X.: Tiny ImageNet Visual Recognition Challenge (2015) Le, Y., Yang, X.: Tiny ImageNet Visual Recognition Challenge (2015)
36.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009) Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
38.
Zurück zum Zitat Pelleg, D., Moore, A.: Accelerating exact k-means algorithms with geometric reasoning. In: Proceedings of the fifth ACM Sigkdd International Conference on Knowledge Discovery And Data Mining, pp. 277–281 (1999) Pelleg, D., Moore, A.: Accelerating exact k-means algorithms with geometric reasoning. In: Proceedings of the fifth ACM Sigkdd International Conference on Knowledge Discovery And Data Mining, pp. 277–281 (1999)
Metadaten
Titel
AdaPQ: Adaptive Exploration Product Quantization with Adversary-Aware Block Size Selection Toward Compression Efficiency
verfasst von
Yan-Ting Ye
Ting-An Chen
Ming-Syan Chen
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2253-2_1

Premium Partner