Skip to main content

2024 | OriginalPaper | Buchkapitel

MLT-Trans: Multi-level Token Transformer for Hierarchical Image Classification

verfasst von : Tanya Boone Sifuentes, Asef Nazari, Mohamed Reda Bouadjenek, Imran Razzak

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper focuses on Multi-level Hierarchical Classification (MLHC) of images, presenting a novel architecture that exploits the “[CLS]” (classification) token within transformers – often disregarded in computer vision tasks. Our primary goal lies in utilizing the information of every [CLS] token in a hierarchical manner. Toward this aim, we introduce a Multi-level Token Transformer (MLT-Trans). This model, trained with sharpness-aware minimization and a hierarchical loss function based on knowledge distillation is capable of being adapted to various transformer-based networks, with our choice being the Swin Transformer as the backbone model. Empirical results across diverse hierarchical datasets confirm the efficacy of our approach. The findings highlight the potential of combining transformers and [CLS] tokens, by demonstrating improvements in hierarchical evaluation metrics and accuracy up to 5.7% on the last level in comparison to the base network, thereby supporting the adoption of the MLT-Trans framework in MLHC.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bertinetto, L., Mueller, R., Tertikas, K., Samangooei, S., Lord, N.A.: Making better mistakes: leveraging class hierarchies with deep networks. In: Proceedings of the IEEE/CVF Conference, pp. 12506–12515 (2020) Bertinetto, L., Mueller, R., Tertikas, K., Samangooei, S., Lord, N.A.: Making better mistakes: leveraging class hierarchies with deep networks. In: Proceedings of the IEEE/CVF Conference, pp. 12506–12515 (2020)
2.
Zurück zum Zitat Boone-Sifuentes, T., Bouadjenek, M.R., Razzak, I., Hacid, H., Nazari, A.: A mask-based output layer for multi-level hierarchical classification. In: CIKM’22, pp. 3833–3837 (2022) Boone-Sifuentes, T., Bouadjenek, M.R., Razzak, I., Hacid, H., Nazari, A.: A mask-based output layer for multi-level hierarchical classification. In: CIKM’22, pp. 3833–3837 (2022)
3.
Zurück zum Zitat Boone-Sifuentes, T., et al.: Marine-tree: large-scale marine organisms dataset for hierarchical image classification. CIKM ’22, New York, NY, USA (2022) Boone-Sifuentes, T., et al.: Marine-tree: large-scale marine organisms dataset for hierarchical image classification. CIKM ’22, New York, NY, USA (2022)
6.
Zurück zum Zitat Chou, P.Y., Kao, Y.Y., Lin, C.H.: Fine-grained visual classification with high-temperature refinement and background suppression. arXiv preprint arXiv:2303.06442 (2023) Chou, P.Y., Kao, Y.Y., Lin, C.H.: Fine-grained visual classification with high-temperature refinement and background suppression. arXiv preprint arXiv:​2303.​06442 (2023)
7.
Zurück zum Zitat Diao, Q., Jiang, Y., Wen, B., Sun, J., Yuan, Z.: MetaFormer: a unified meta framework for fine-grained recognition. arXiv preprint arXiv:2203.02751 (2022) Diao, Q., Jiang, Y., Wen, B., Sun, J., Yuan, Z.: MetaFormer: a unified meta framework for fine-grained recognition. arXiv preprint arXiv:​2203.​02751 (2022)
9.
Zurück zum Zitat Dosovitskiy, A., et al.: An image is worth 16\(\,\times \,16\) words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Dosovitskiy, A., et al.: An image is worth 16\(\,\times \,16\) words: transformers for image recognition at scale. arXiv preprint arXiv:​2010.​11929 (2020)
10.
Zurück zum Zitat Foret, P., Kleiner, A., Mobahi, H., Neyshabur, B.: Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412 (2020) Foret, P., Kleiner, A., Mobahi, H., Neyshabur, B.: Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:​2010.​01412 (2020)
11.
12.
Zurück zum Zitat Huo, Y., Lu, Y., Niu, Y., Lu, Z., Wen, J.R.: Coarse-to-fine grained classification. In: Proceedings of the ACM SIGIR Conference, pp. 1033–1036. SIGIR’19 (2019) Huo, Y., Lu, Y., Niu, Y., Lu, Z., Wen, J.R.: Coarse-to-fine grained classification. In: Proceedings of the ACM SIGIR Conference, pp. 1033–1036. SIGIR’19 (2019)
13.
Zurück zum Zitat Khosla, A., Jayadevaprakash, N., Yao, B., Li, F.F.: Novel dataset for fine-grained image categorization: Stanford dogs. In: Proceedings of CVPR Workshop on Fine-Grained Visual Categorization (FGVC). vol. 2. Citeseer (2011) Khosla, A., Jayadevaprakash, N., Yao, B., Li, F.F.: Novel dataset for fine-grained image categorization: Stanford dogs. In: Proceedings of CVPR Workshop on Fine-Grained Visual Categorization (FGVC). vol. 2. Citeseer (2011)
14.
Zurück zum Zitat Kim, S., Nam, J., Ko, B.C.: ViT-NeT: interpretable vision transformers with neural tree decoder. In: International Conference on Machine Learning, pp. 11162–11172. PMLR (2022) Kim, S., Nam, J., Ko, B.C.: ViT-NeT: interpretable vision transformers with neural tree decoder. In: International Conference on Machine Learning, pp. 11162–11172. PMLR (2022)
15.
Zurück zum Zitat Kosmopoulos, A., Partalas, I., Gaussier, E., Paliouras, G., Androutsopoulos, I.: Evaluation measures for hierarchical classification: a unified view and novel approaches. Data Min. Knowl. Disc. 29(3), 820–865 (2015)MathSciNetCrossRef Kosmopoulos, A., Partalas, I., Gaussier, E., Paliouras, G., Androutsopoulos, I.: Evaluation measures for hierarchical classification: a unified view and novel approaches. Data Min. Knowl. Disc. 29(3), 820–865 (2015)MathSciNetCrossRef
16.
Zurück zum Zitat Liu, Y., Dou, Y., Jin, R., Qiao, P.: Visual tree convolutional neural network in image classification. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 758–763. IEEE (2018) Liu, Y., Dou, Y., Jin, R., Qiao, P.: Visual tree convolutional neural network in image classification. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 758–763. IEEE (2018)
17.
Zurück zum Zitat Liu, Z., et al.: Swin Transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF Conference, pp. 10012–10022 (2021) Liu, Z., et al.: Swin Transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF Conference, pp. 10012–10022 (2021)
18.
Zurück zum Zitat Maji, S., Kannala, J., Rahtu, E., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. Tech. rep. (2013) Maji, S., Kannala, J., Rahtu, E., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. Tech. rep. (2013)
19.
Zurück zum Zitat Parag, T., Wang, H.: Multilayer dense connections for hierarchical concept classification. arXiv preprint arXiv:2003.09015 (2020) Parag, T., Wang, H.: Multilayer dense connections for hierarchical concept classification. arXiv preprint arXiv:​2003.​09015 (2020)
20.
Zurück zum Zitat Schmid, F., Masoudian, S., Koutini, K., Widmer, G.: Knowledge distillation from transformers for low-complexity acoustic scene classification. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2022 Workshop (2022) Schmid, F., Masoudian, S., Koutini, K., Widmer, G.: Knowledge distillation from transformers for low-complexity acoustic scene classification. In: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2022 Workshop (2022)
21.
Zurück zum Zitat Seo, Y., Shin, K.S.: Hierarchical convolutional neural networks for fashion image classification. Expert Syst. Appl. 116, 328–339 (2019)CrossRef Seo, Y., Shin, K.S.: Hierarchical convolutional neural networks for fashion image classification. Expert Syst. Appl. 116, 328–339 (2019)CrossRef
22.
Zurück zum Zitat Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1), 31–72 (2011)MathSciNetCrossRef Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1), 31–72 (2011)MathSciNetCrossRef
24.
Zurück zum Zitat Xu, L., Ouyang, W., Bennamoun, M., Boussaid, F., Xu, D.: Multi-class token transformer for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference, pp. 4310–4319 (2022) Xu, L., Ouyang, W., Bennamoun, M., Boussaid, F., Xu, D.: Multi-class token transformer for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference, pp. 4310–4319 (2022)
25.
Zurück zum Zitat Yan, Z., et al.: HD-CNN: hierarchical deep convolutional neural networks for large scale visual recognition. In: Proceedings of the IEEE ICCV Conference (2015) Yan, Z., et al.: HD-CNN: hierarchical deep convolutional neural networks for large scale visual recognition. In: Proceedings of the IEEE ICCV Conference (2015)
26.
Zurück zum Zitat Zhang, Z., Zhang, H., Zhao, L., Chen, T., Arik, S.Ö., Pfister, T.: Nested hierarchical transformer: towards accurate, data-efficient and interpretable visual understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 3417–3425 (2022) Zhang, Z., Zhang, H., Zhao, L., Chen, T., Arik, S.Ö., Pfister, T.: Nested hierarchical transformer: towards accurate, data-efficient and interpretable visual understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 3417–3425 (2022)
27.
Zurück zum Zitat Zhu, X., Bain, M.: B-CNN: branch convolutional neural network for hierarchical classification. arXiv preprint arXiv:1709.09890 (2017) Zhu, X., Bain, M.: B-CNN: branch convolutional neural network for hierarchical classification. arXiv preprint arXiv:​1709.​09890 (2017)
Metadaten
Titel
MLT-Trans: Multi-level Token Transformer for Hierarchical Image Classification
verfasst von
Tanya Boone Sifuentes
Asef Nazari
Mohamed Reda Bouadjenek
Imran Razzak
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2259-4_29

Premium Partner