Skip to main content

2024 | Buch

Intelligent Information Processing XII

13th IFIP TC 12 International Conference, IIP 2024, Shenzhen, China, May 3–6, 2024, Proceedings, Part II

insite
SUCHEN

Über dieses Buch

The two-volume set IFIP AICT 703 and 704 constitutes the refereed conference proceedings of the 13th IFIP TC 12 International Conference on Intelligent Information Processing XII, IIP 2024, held in Shenzhen, China, during May 3–6, 2024.
The 49 full papers and 5 short papers presented in these proceedings were carefully reviewed and selected from 58 submissions. The papers are organized in the following topical sections:
Volume I: Machine Learning; Natural Language Processing; Neural and Evolutionary Computing; Recommendation and Social Computing; Business Intelligence and Risk Control; and Pattern Recognition.
Volume II: Image Understanding.

Inhaltsverzeichnis

Frontmatter

Pattern Recognition

Frontmatter
Early Anomaly Detection in Hydraulic Pumps Based on LSTM Traffic Prediction Model
Abstract
Hydraulic pumps, vital in modern industrial equipment, face the challenge of direct flow rate measurement due to their intricate internal structures. Consequently, devising predictive methods for the main pump flow is crucial for early anomaly detection and efficient maintenance. This paper introduces a predictive method for hydraulic pump flow based on Long Short-Term Memory networks (LSTM), known for their robust handling of temporal data. Utilizing LSTM, the method predicts flow rates, which are then employed to compute the volumetric efficiency under steady rotational conditions, thus evaluating the pump’s operational status. The proposed model’s experimental validation, marked by a low mean square error in flow prediction, attests to its efficacy. Moreover, the derived average volumetric efficiency value of 0.97 serves as a reliable indicator for identifying potential anomalies in hydraulic pump performance.
Jiaxing Ma, Yong Wang, Jun Wen, Bo Zhang, Wei Li
Dynamic Parameter Estimation for Mixtures of Plackett-Luce Models
Abstract
Traditional parameter estimation algorithms rely on static datasets whose data remain constant during program execution. However, in the real-world scenario, rank data often updates in real-time, e.g., when users perform operations, such as submitting or withdrawing rankings. This dynamic nature of rank data poses challenges for applying traditional algorithms. To address this issue, we propose parameter estimation algorithms tailored for structured partial rankings based on dynamic datasets in this paper. These dynamic datasets can be classified as extended datasets and compressed datasets. To handle each dataset type, we introduce the extension preference learning algorithm and the compression preference learning algorithm based on GMM and Elsr algorithms, respectively. These algorithms ensure a relatively consistent dataset size over time, balancing accuracy and efficiency. Experimental results conducted in this paper compare the accuracy, efficiency, and stability of various algorithms using synthetic datasets, Sushi datasets, and Irish datasets, which demonstrate the effectiveness of our proposed algorithm in real-world scenarios.
Aling Liao, Zan Zhang, Chenyang Bu, Lei Li
Recognition of Signal Modulation Pattern Based on Multi-task Self-supervised Learning
Abstract
In wireless communication, the recognition of signal modulation plays an essential role. However, acquiring high-quality data in wireless communications is often prohibitively expensive and challenging. Traditional methods for modulation pattern recognition are limited by specialized knowledge, resulting in poor adaptability and generalization. Although deep neural networks demonstrate superior performance in modulation pattern recognition, they heavily depend on high-quality and accurately annotated training data. They require significant computational resources during training, rendering them unsuitable for resource-constrained devices or real-time applications. We propose a signal modulation pattern recognition method based on multi-task self-supervised learning to overcome these challenges. This approach begins by enhancing data from various unlabeled categories, then capturing the essential signal characteristics through contrastive learning to obtain a robust pre-trained model. We then fine-tune the model with a small account of labeled modulation samples to better adapt it to downstream tasks. Experimental results indicate that in scenarios with limited sample availability, our method slightly surpasses traditional recognition methods in accuracy and shows significant advantages in training efficiency.
Dianjing Cheng, Xingyu Wu, Zhenghao Xie, Zhihua Cui, Qiong Li, Endong Tong, Wenjia Niu, Ziyi Wei, Xinyi Zhao
Dependency-Type Weighted Graph Convolutional Network on End-to-End Aspect-Based Sentiment Analysis
Abstract
Previous studies consider little on using dependency-type messages in the E2E-ABSA task. Studies using dependency-type messages just contact the dependency-type message and word embedding vectors, which may not fully fuse the context feature and information from the dependency type. This paper proposes a new model called Dependency-Type Weighted Graph Convolution Network (DTW-GCN) to compose dependency-type messages and word embedding. We use a type-weighted matrix to combine the dependency-type message, and DTW-GCN could fuse the dependency-type message and word embedding vectors. Experiments conducted on three benchmark datasets verify the effectiveness of our model.
Yusong Mu, Shumin Shi
Utilizing Attention for Continuous Human Action Recognition Based on Multimodal Fusion of Visual and Inertial
Abstract
Both Visual and inertial are important modals of human action recognition and have a wide range of applications in virtual reality, human-computer interaction, action perception, and other fields. Currently, most of the work has achieved significant results by utilizing both visual and inertial sensor data, as well as deep learning methods. This method of integrating multimodal information makes the system more robust and adaptable to different environments and action scenarios. However, these works still have the drawbacks of data fusion and high demand for computing resources. In this article, a method for continuous human action recognition based on visual and inertial sensors using attention is proposed. Specifically, a deep visual inertial attention network(VIANet) architecture was designed to integrate spatial, channel and temporal attention into visual 3D CNN, integrate temporal attention mechanism into inertial 2D CNN, and perform decision level fusion on it. Experimental verification was conducted on the C-MHAD public dataset. The experiment shows that the proposed VIANet outperforms previous baseline in multi-modal human action recognition.
Liang Hua, Yong Huang, Chao Liu, Tao Zhu
HARFMR: Human Activity Recognition with Feature Masking and Reconstruction
Abstract
The widespread adoption of deep learning in the computer science field has significantly improved the functionality of wearable sensors, such as the recognition and localization of human activities. Nevertheless, the challenge of annotating and training sensor data persists due to the high associated costs. Unlabeled sensor data is more accessible and easier to train compared to labeled data, which has led to increased interest in self-supervised learning for human activity recognition. Masked reconstruction of raw sensor data is a method commonly employed in self-supervised learning. When applied to human activity recognition, the technique involves time-centric data masking and subsequent reconstruction. However, the masking and reconstruction of raw sensor data may potentially lead to the exclusion of crucial information, resulting in representations with lower semantic levels. To address this, we present a new strategy for masking and reconstruction, called Human Activity Recognition with Feature Masking and Reconstruction (HARFMR), specifically designed for human activity recognition. This architecture includes the masking of features using a random ratio and the subsequent reconstruction of the original sensor data, compelling the encoder to emphasize the contextual correlations of the data’s features and the properties of the features during the reconstruction process. Our evaluation of the proposed masking strategy on three public datasets demonstrates that the HARFMR method surpasses existing masking reconstruction schemes under self-supervised and semi-supervised settings.
Wenxuan Cui, Yingjie Chen, Yong Huang, Chao Liu, Tao Zhu
CAPPIMU: A Composite Activities Dataset for Human Activity Recognition Utilizing Plantar Pressure and IMU Sensors
Abstract
Human composite activity recognition enhances smart homes to better understand and respond to user needs, and plays a key role in activity assistance. However, the current public datasets for composite activities are limited in the variety of activities and the number of subjects they include, which hinders a thorough and complete assessment of activity identification methodologies. Regarding these problems, this paper proposes a publicly available dataset named CAPPIMU (Composite Activities with Plantar Pressure and IMU sensors). The dataset consists of 21 activities and two different modal synchronization data, including 15 composite activities and 6 simple activities. These modalities include plantar pressure sensors and IMUs (Inertial Measurement Units) at nine different body locations. Compared with single-modal sensors, multimodal sensors can provide a richer representation of information for activity recognition tasks. Moreover, we conduct a thorough examination of the classification effects exerted by plantar pressure and inertial data from various locations on the recognition of activities, utilizing a selection of widely-recognized deep learning models. The experimental results show that it is possible to classify these 21 household human activities with high accuracy, and the right wrist is found to be the optimal location for activity recognition with single-place IMU sensors.
Bin Luo, Qi Qiu, Tao Zhu, Zhenyu Liu
Open-Set Sensor Human Activity Recognition Based on Reciprocal Time Series
Abstract
Human activity recognition (HAR) using wearable sensors has witnessed significant advancements in recent years. However, the traditional closed-set assumption restricts models to predicting only known activity classes. This limitation can be overcome by building models under the open-set recognition paradigm. Most existing methods mitigate the open-set risk by adjusting known class boundaries, but this approach overlooks the potential correlations between unknown classes and can lead to over-generalization, requiring more and higher-quality training data. This paper introduces the concept of reciprocal time series, which serves as the latent representation of the unknown class space for each known class. By comparing samples to these reciprocal time series, the model can classify them as either known or unknown. We propose a novel metric to measure temporal similarity within the embedding space. The constructed boundary space, formed by the reciprocal time series, facilitates the effective learning of inherent generalization features from a large number of unknown samples through multi-class interaction, ultimately reducing the open-set risk. Extensive experiments on three public sensor datasets demonstrate that our model surpasses existing methods on the open-set recognition task for sensor-based HAR, particularly excelling in recognizing unknown class instances.
Yingjie Chen, Wenxuan Cui, Yong Huang, Chao Liu, Tao Zhu

Image Understanding

Frontmatter
A Concept-Based Local Interpretable Model-Agnostic Explanation Approach for Deep Neural Networks in Image Classification
Abstract
A well-recognized and widely-used explainable artificial intelligence (XAI) method is Local Interpretable Model-agnostic Explanations (LIME), which offers instance-level interpretation by generating new data around the instance and training a locally interpretable linear model. However, when using LIME to explain the image classification model, it generates interpretations at the level of super-pixel representation. This does not assure comprehensibility to humans due to the lack of semantic information in super-pixels. To enhance the intelligibility of LIME, we propose an advanced version of LIME, termed Concept-based Local Interpretable Model-agnostic Explanations (ConceptLIME). In ConceptLIME, the explanations are formulated in terms of human-understandable concepts as opposed to the semantically deficient super-pixels, thereby augmenting the comprehensibility of the original LIME method. Comparative experiments have been conducted between ConceptLIME and LIME to validate the effectiveness of ConceptLIME. The experimental results indicate that ConceptLIME outperforms LIME regarding predictive performance on both the perturbation dataset and the explained instances. Moreover, the fidelity of the explanations generated by ConceptLIME surpasses that produced by LIME. The interpretations provided by ConceptLIME are more intelligible and intuitive than LIME’s explanations. Consequently, our proposed ConceptLIME exhibits superior properties, including predictive performance, fidelity, and comprehensibility, when compared with LIME.
Lidan Tan, Changwu Huang, Xin Yao
A Deep Neural Network-Based Segmentation Method for Multimodal Brain Tumor Images
Abstract
Medical image segmentation plays an important role in medical diagnosis. Accurate segmentations of brain tumor images require well-designed segmentation models and sufficient high-quality well-labeled training samples, but it is difficult for existing segmentation methods to meet these requirements. In this paper, we propose a segmentation method, which involves a GAN-nested model and an improved UNet. The GAN-nested model is used to automatically generate sufficient well-labeled brain tumor images, which are used as training samples; the improved UNet is good at extracting the detailed features of brain tumor images, and therefore it can conduct the accurate segmentation for brain tumor images using high-quality training samples generated by the GAN-nested model. Extensive experimental results prove that the proposed method is effective and obtains state-of-the-art performance on the given datasets.
Zuqiang Meng, Yue Peng
Graph Convolutional Networks for Predicting Mechanical Characteristics of 3D Lattice Structures
Abstract
Recent advancements in deep learning methods encouraged researchers to apply them to process 3D objects. Initially, convolutional neural networks which have shown their ability in the processing of 2D images were used for 3D object processing. These methods need a complex process to convert 3D objects to 2D images. This conversion leads to increased computation cost and possible information loss during the transformation. This research introduces a Graph Convolutional Network approach for predicting mechanical properties of custom-designed 3D lattice structures for tissue engineering applications. Seventeen scaffold geometrics were generated for training while eight were used for testing. Unlike traditional preprocessing into images, this methodology reduces preprocessing by leveraging GCNs to directly process 3D geometrics in graph form. The experimental results show the efficiency of our proposed method in predicting 3D lattice structures.
Valentine Oleka, Seyyed Mohsen Zahedi, Aboozar Taherkhani, Reza Baserinia, S. Abolfazl Zahedi, Shengxiang Yang
3D Object Reconstruction with Deep Learning
Abstract
Recent advancements and breakthroughs in deep learning have accelerated the rapid development in the field of computer vision. Having recorded a huge success in 2D object perception and detection, a lot of progress has also been made in 3D object reconstruction. Since humans can infer and relate better with 3D world images by just a single view 2D image of the object, it is necessary to train computers to think in 3D to achieve some key applications of computer vision. The use of deep learning in 3D object reconstruction of single-view images is rapidly evolving and recording significant results. In this research, we explore the Facebook well-known hybrid approach called Mesh R-CNN that combines voxel generation and triangular mesh reconstruction to generate 3D mesh structure of an object from a 2D single-view image. Although the reconstruction of objects with varying geometry and topology was achieved by Mesh R-CNN, the mesh quality was affected due to topological errors like self-intersection, causing non-smooth and rough mesh generation. In this research, Mesh R-CNN with Laplacian Smoothing (Mesh R-CNN-LS) was proposed to use the Laplacian smoothing and regularization algorithm to refine the non-smooth and rough mesh. The proposed Mesh R-CNN-LS helps to constrain the triangular deformation and generate a better and smoother 3D mesh. The proposed Mesh R-CNN-LS was compared with the original Mesh R-CNN on the Pix3D dataset and it showed better performance in terms of the loss and average precision score.
Stephen S. Aremu, Aboozar Taherkhani, Chang Liu, Shengxiang Yang
Adaptive Prototype Triplet Loss for Cross-Resolution Face Recognition
Abstract
Although face recognition has achieved great success in many areas, cross-resolution face recognition (CRFR) still remains a challenging task due to the large domain gap between low-resolution (LR) and high-resolution (HR) images. In this paper, we propose an adaptive prototype triplet loss (APTL) for CRFR. The APTL pulls the features close to their own prototypes, and pushes them away from the prototypes of other classes. Thus, the angular distances between features and prototypes from the same class are closer than those from different classes. Furthermore, to better exploit the similarity information among different identities, we adaptively adjust the margin term in the loss. Since the proposed APTL is applied simultaneously to HR and LR features, the gap between two domains can be narrowed naturally. Experiments on LFW and SCface datasets illustrate the superiority of our method.
Yongru Chen, Wenxian Zheng, Xiaying Bai, Qiqi Bao, Wenming Yang, Guijin Wang, Qingmin Liao
Hand Gesture Recognition Using a Multi-modal Deep Neural Network
Abstract
As devices around us get more intelligent, new ways of interacting with them are sought to improve user convenience and comfort. While gesture-controlled systems have existed for some time, they either use additional specialized imaging equipment, require unreasonable computing resources, or are simply not accurate enough to be a viable alternative. In this work, a reliable method of recognizing gestures is proposed. The built model correctly classifies hand gestures for keyboard typing based on the activity captured by an ordinary camera. Two models are initially developed for classifying video data and classifying time-series sequences of the skeleton data extracted from a video. The models use different strategies of classification and are built using lightweight architectures. The two models are the baseline models which are integrated to form a single multi-modal model with multiple inputs, i.e., video and time-series inputs, to improve accuracy. The performances of the baseline models are then compared to the multimodal classifier. Since the multimodal classifier is based on the initial models, it naturally inherits the benefits of both baseline architectures and provides a higher testing accuracy of 100% compared to the accuracy of 85% and 75% for the baseline models respectively.
Saneet Fulsunder, Saidu Umar, Aboozar Taherkhani, Chang Liu, Shengxiang Yang
Backmatter
Metadaten
Titel
Intelligent Information Processing XII
herausgegeben von
Zhongzhi Shi
Jim Torresen
Shengxiang Yang
Copyright-Jahr
2024
Electronic ISBN
978-3-031-57919-6
Print ISBN
978-3-031-57918-9
DOI
https://doi.org/10.1007/978-3-031-57919-6

Premium Partner