Skip to main content

2024 | Buch

Social Media Processing

11th Chinese National Conference, SMP 2023, Anhui, China, November 23–26, 2023, Proceedings

herausgegeben von: Feng Wu, Xuanjing Huang, Xiangnan He, Jiliang Tang, Shu Zhao, Daifeng Li, Jing Zhang

Verlag: Springer Nature Singapore

Buchreihe : Communications in Computer and Information Science

insite
SUCHEN

Über dieses Buch

This book constitutes the thoroughly refereed proceedings of the 11th Chinese National Conference of Social Media Processing, SMP 2023, held in Anhui, China, in November 2023.
The 16 full papers presented were carefully reviewed and selected from 88 submissions. The papers are organized in the topical sections on knowledge representation and reasoning; knowledge acquisition and knowledge base construction; linked data, knowledge integration, and knowledge graph storage management; natural language understanding and semantic computing; knowledge graph applications; knowledge graph open resources.

Inhaltsverzeichnis

Frontmatter
DABP: A Domain Augmentation and Bidirectional Stack-Propagation Model for Task-Oriented NLU
Abstract
Natural language understanding (NLU) is the key part of task-oriented dialogue systems. Nowadays, most existing task-oriented NLU models use pre-trained models (PTMs) for semantic encoding, but those PTMs often perform poorly on specific task-oriented dialogue data due to small data volume and lack of domain-specific knowledge. Besides that, most joint modeling models of slot filling and intention detection only use a joint loss function, or only provides a one-way semantic connection, which fails to achieve the interaction of information between the two tasks at a deep level. In this paper, we propose a Domain Augmentation and Bidirectional Stack Propagation (DABP) model for NLU. In the proposed model, we use the masked language model (MLM) task and the proposed part-of-speech tagging task to enhance PTMs with domain-specific knowledge include both implicit and explicit. Besides that, we propose a bidirectional stack-propagation mechanism to propagate the information between the two tasks. Experimental results show that the proposed model can achieve better performance than the state-of-the-art models on the ATIS and SNIPS datasets.
Shizhan Lan, Yuehao Xiao, Zhenyu Wang
Knowledge Graph Completion via Subgraph Topology Augmentation
Abstract
Knowledge graph completion (KGC) has achieved widespread success as a key technique to ensure high-quality structured knowledge for downstream tasks (e.g., recommendation systems and question answering). However, within the two primary categories of KGC algorithms, the embedding-based methods lack interpretability and most of them only work in transductive settings, while the rule-based approaches sacrifice expressive power to ensure that the models are interpretable. To address these challenges, we propose KGC-STA, a knowledge graph completion method via subgraph topology augmentation. First, KGC-STA contains two topological augmentations for the enclosing subgraphs, including the missing relation completion for sparse nodes and the removal of redundant nodes. Therefore, the augmented subgraphs can provide more useful information. Then a message-passing layer for multi-relation is designed to efficiently aggregate and learn the surrounding information of nodes in the subgraph for triplet scoring. Experimental results in WN18RR and FB15k-237 show that KGC-STA outperforms other baselines and shows higher effectiveness.
Huafei Huang, Feng Ding, Fengyi Zhang, Yingbo Wang, Ciyuan Peng, Ahsan Shehzad, Qihang Lei, Lili Cong, Shuo Yu
The Diffusion of Vaccine Hesitation: Media Visibility Versus Scientific Authority
Abstract
[Purpose/Significance] This study quantifies media visibility and scientific authority of vaccine scientists and anti-vaxxers. We analyze differences and associations through media co-occurrence and scientific inter-citation networks to comprehend vaccine hesitancy causes. [Methods/Process] We collected 100,000 research documents and 60,000 English-language media article metadata from 213 anti-vaxxers and 200 vaccine scientists. Differences in their media visibility were analyzed individually and as groups. We explored passive and active media presentation of anti-vaxxers and vaccine scientists. Co-occurrence and citation associations were examined through separate networks. Media articles were analyzed for frequency of appearance and pronoun use. [Results/Conclusions] Anti-vaxxers’ media visibility is 52% higher, but top 50 vaccine scientists surpass anti-vaxxers in visibility. Media focus on anti-vaxxer topics drives attention. Despite limited scientific authority, anti-vaxxers gain traction through disinformation. Vaccine scientists gain visibility based on their scientific authority. Anti-vaxxers’ close interconnections induce team effects, aiding opposition spread. Controversial nature makes anti-vaxxers more frequent in coverage. Pronoun differences highlight contrasting perspectives. These findings aid understanding of vaccine reporting and information dissemination for tackling vaccine hesitancy.
Zhai Yujia, Yao Yonghui, Liang Yixiao
An Emotion Aware Dual-Context Model for Suicide Risk Assessment on Social Media
Abstract
Suicide risk assessment on social media is an essential task for mental health surveillance. Although extensively studied, existing works share the following limitations, including (1) insufficient exploitation of Non-SuicideWatch posts, and (2) ineffective consideration of the fine-grained emotional information in both SuicideWatch and Non-Suicide-Watch posts. To tackle these issues, we propose an emotion aware dual-context model to predict suicide risk. Specifically, SuicideWatch posts that contain psychological crisis are leveraged to obtain the suicidal ideation context. Then, given that suicidal ideation is not instantaneous and Non-SuicideWatch posts can provide essential information, we encode the emotion-related features and emotional changes with variable time intervals, revealing users’ mental states. Finally, the embeddings of SuicideWatch, Non-SuicideWatch, LIWC feature, and posting time are concatenated and poured into a fully-connected network for suicide risk level recognition. Extensive experiments are conducted to validate the effectiveness of our method. In results, our scheme outperforms the first place in CLPsych2019 task B by 4.9% on Macro-F1 and achieves a 10.7% increase on F1 of Severe Risk label than the first place in CLPsych2019 task A that only uses SuicideWatch posts.
Zifang Liang, Dexi Liu, Qizhi Wan, Xiping Liu, Guoqiong Liao, Changxuan Wan
Item Recommendation on Shared Accounts Through User Identification
Abstract
Nowadays, people often share their subscription accounts, e.g. online content subscription accounts, among family members and friends. It is important to identify different users under one single account and then recommend specific items to decoupled individuals. In this paper, we propose the Projected Discriminant Attentive Embedding (PDAE) model and the Shared Account-aware Bayesian Personalized Ranking (SA-BPR) model for user identification and item recommendation, respectively. PDAE separates item consumption actions of each individual from mixed account history by learning the item representation that has both the user preference and user demographic information integrated; SA-BPR is a robust recommendation model based on a hierarchical ranking strategy where items from different sets are recommended with different levels of priorities. The experiments show that the proposed models generally outperform state-of-the-art approaches in both user identification and item recommendation.
Chongming Gao, Min Wang, Jiajia Chen
Prediction and Characterization of Social Media Communication Effects of Emergencies with Multimodal Information
Abstract
This paper investigates the communication effects on social media, and their influencing factors. A emergency events dataset spanning the period from 2019 to 2023 is constructed, comprising a large volume of textual and image data obtained through web crawling. The communication effects of emergency social media posts during various emergent events are analyzed using a comprehensive paradigm, with the breadth and depth of dissemination measured by the sum of likes and comments, as well as the number of reposts. LightGBM is employed as the classifier, and multidimensional features incorporating visual and textual dimensions are constructed. Experimental results highlight the significant impact of the image modality on dissemination effects, particularly emphasizing the importance of features such as HSV and image content categories. Additionally, the number of followers of the original poster is identified as a crucial factor influencing dissemination effects. The experimental results show that the research method based on feature engineering and machine learning can effectively predict the propagation effect of microblog, and the LightGBM algorithm performs best. The study further found that in the comparison of modal effects, the graphical multimodal has better performance.
Yuyang Tian, Shituo Ma, Qingyuan He, Ran Wang
Leverage Heterogeneous Graph Neural Networks for Short-Text Conceptualization
Abstract
The conceptualization of short-texts is playing an increasingly important role in text comprehension, social media processing and other applications. Generally, this problem could be modeled as a heterogeneous semantic network connecting words (in short-text) and corresponding concepts (in knowledge base). As a crucial step in short-text conceptualization task, learning interactive relations among words and concepts has been explored through numerous methods. One intuitive method is to place them in a graph based neural network with a more complex structure to capture inter-concept/word relationships. Hence, this paper presents a heterogeneous graph-based neural network (HGNN) for short-text conceptualization, which includes semantic nodes with various granularity levels, mainly consisting of basic- semantic nodes (e.g., words) and supernodes (e.g., concepts). The proposed model could provide a flexible and natural modeling tool to model such complex relationships and capture more expressive and discriminative concepts, by leveraging mutually reinforcing strategy on heterogeneous correlations. On the other word, it is a beneficial attempt to introduce different types of semantic nodes into graph based neural networks for short-text conceptualization task, and we conduct comprehensive qualitative analysis to investigate its benefits.
Xiaoye Ouyang, Yashen Wang, Qiang Li, Zhuoya Ju, Chenyu Liu, Yi Zhang
Short-Text Conceptualization Based on Hyper-Graph Learning and Multiple Prior Knowledge
Abstract
Short-text conceptualization is a notable task and popular issue in current social network analysis and natural language processing. This line of work usually views the data as a heterogeneous semantic network connecting terms (in short-text) and corresponding concepts (in prior knowledge base), with complex relationships (e.g., term-correlation, concept-correlation and subordination, etc.,). Therefore, this paper introduces hyper-graph learning strategy for solving this problem, because of its ability for modeling complex relationships. Overall, this paper proposes a novel short-text conceptualization model based on hyper-graph convolutional network. Especially, this model is capable to make the signals (i.e., terms and concepts) to be sufficiently interacted, by leveraging three prior knowledge for modeling heterogeneous correlations among terms and concepts, including: subordination prior knowledge, concept correlation prior knowledge and term correlation prior knowledge. The experimental results demonstrate that the proposed work achieves higher accuracy in short-text conceptualization task when compared with the current state-of-the-art algorithms.
Li Li, Yashen Wang, Xiaolei Guo, Liu Yuan, Bin Li, Shengxin Xu
What You Write Represents Your Personality: A Dual Knowledge Stream Graph Attention Network for Personality Detection
Abstract
The goal of the personality detection task is to determine a person’s personality traits using their social media posts. Recently, researchers have turned away from a fully data-driven approach and begun employing prior knowledge about psycholinguistic to guide their research. People typically post on social media to express their opinions or share their emotions. Therefore, it is crucial to uncover the traits and disparities in how individuals with different personalities express themselves. However, current research based on psycholinguistic principles only examines these differences superficially, failing to conduct more granular analyses, such as exploring emotions. In this paper, we propose an innovative approach that blends psycholinguistic and prior emotional knowledge to acquire features at varying levels. Our model, named Dual Knowledge Stream Graph Attention Network (DKSGAT), comprises of two streams. One stream represents posts at the psycholinguistic level, while the other encodes words at a more finely-grained emotional level based on prior emotional knowledge. Both streams’ representations are then obtained to make joint inferences about personality traits. Our approach outperforms previous studies in predicting the Big Five personality and MBTI personality, as demonstrated through testing on two different public datasets.
Zian Yan, Ruotong Wang, Xiao Sun
Detect Depression from Social Networks with Sentiment Knowledge Sharing
Abstract
Social network plays an important role in propagating people’s viewpoints, emotions, thoughts, and fears. Notably, following lockdown periods during the COVID-19 pandemic, the issue of depression has garnered increasing attention, with a significant portion of individuals resorting to social networks as an outlet for expressing emotions. Using deep learning techniques to discern potential signs of depression from social network messages facilitates the early identification of mental health conditions. Current efforts in detecting depression through social networks typically rely solely on analyzing the textual content, overlooking other potential information. In this work, we conduct a thorough investigation that unveils a strong correlation between depression and negative emotional states. The integration of such associations as external knowledge can provide valuable insights for detecting depression. Accordingly, we propose a multi-task training framework, DeSK, which utilizes shared sentiment knowledge to enhance the efficacy of depression detection. Experiments conducted on both Chinese and English datasets demonstrate the cross-lingual effectiveness of DeSK.
Yan Shi, Yao Tian, Chengwei Tong, Chunyan Zhu, Qianqian Li, Mengzhu Zhang, Wei Zhao, Yong Liao, Pengyuan Zhou
AOM: A New Task for Agitative Opinion Mining in We-media
Abstract
As We-media continues to develop, there is a concerning rise of agitative opinions on We-media platforms, which usually lead to online violence. To formalize the research on identifying whether a sentence contains agitative opinions, we propose a new task AOM for agitative opinion mining in We-media. To clarify the task, we make a clear definition to agitative opinions in which they are categorized into nine types and we manually construct a ten-thousands scale Chinese agitative opinion dataset CAOD based on WeChat public account, for research purpose. Furthermore, a baseline model \(\mathrm{CAOD_{MINER}}\) based on TextCNN is proposed and sampling methods are adopted in training it. For comparison, we also apply several mainstream text classifiers into CAOD. The comparative experiment and further analysis show that AOM is a soluble but challenging task, where unbalanced data distribution, diversity of expression forms, context dependency, scarcity of external knowledge and implicit expression deserve to be studied in the future.
Huazi Yin, Jintao Tang, Shasha Li, Ting Wang
PNPT: Prototypical Network with Prompt Template for Few-Shot Relation Extraction
Abstract
Few-shot relation extraction involves predicting the relations between entity pairs in a sentence with a limited number of labeled instances for each specific relation. Prototypical network, which is based on the meta-learning framework, has been widely adopted for this task. Existing prototypical network-based approaches typically obtain the relation representation by concatenating the embeddings corresponding to start tokens of two entity mentions. While these methodologies have demonstrated commendable performance, we argue that the current relation representation fails to fully capture semantic nuances within complex scenes, where the identical entity pairs often convey diverse semantic relationship. In this paper, we propose an innovative relation representation approach that integrates textual context and entity mentions through a prompt template. Furthermore, we introduce a gate mechanism to selectively incorporate external relation knowledge into the origin relation prototype derived from support instances. Experimental results on two benchmark datasets demonstrate the effectiveness of our proposed approach.
Liping Li, Yexuan Zhang, Jiajun Zou, Yongfeng Huang
CDBMA: Community Detection in Heterogeneous Networks Based on Multi-attention Mechanism
Abstract
Community detection in complex networks is a fundamental task in network analysis. With the continuous evolution of social networks, network structures are becoming more complex and often contain rich heterogeneous information. Traditional community detection methods can only utilize shallow topological features and fail to leverage the rich heterogeneous information in these networks, making community detection in heterogeneous networks a new challenge. In this paper, we propose a community detection model for heterogeneous networks based on multi-attention mechanisms. Our model consists of a structural information encoder and a semantic information encoder. The structural information encoder proposes a subgraph sampler to extract subgraphs around target nodes, uses type attention to aggregate the influence of different types of nodes, and learns the heterogeneous structural information of the network. The semantic information encoder uses node attention to learn the importance of high-order neighbor nodes based on meta-paths, uses semantic attention to learn the weights of different meta-paths, and fuses the content semantic information on different meta-paths to learn the content semantic information of the heterogeneous network. Joint optimization of structural and semantic encoders is achieved through self-supervised learning, addressing the dependence on community labels. We evaluate our model on four real-world datasets, and the results show that our algorithm outperforms several community detection state-of-the-art methods, especially approximate 10% improvement in the NMI metric on the ACM and Freebase datasets, and approximate 20% improvement in the ARI metric on the ACM dataset.
Yuanxin Li, Zhixiang Wu, Zhenyu Wang, Ping Li
An Adaptive Denoising Recommendation Algorithm for Causal Separation Bias
Abstract
In recommender systems, user selection bias often influences user-item interactions, e.g., users are more likely to rate their previously preferred or popular items. Existing methods can leverage the impact of selection bias in user ratings on the evaluation and optimization of recommendation system. However, these methods either inevitably contain a large amount of noise in the sampling process or suffer from the confound between users’ conformity and interests. Inspired by the recent success of causal inference, in this work we propose a novel method to separate popularity biases for recommendation, named adaptive denoising and causal inference algorithm (ADA). We first compute the average rating of all feedback items of each user as the basis in converting explicit feedback to implicit feedback, and then obtain the true positive implicit data through adaptive denoising method. In addition, we separate the confounding of users’ conformity and interest in the selection bias by causal inference. Specifically, we construct a multi-task learning model with regularization loss functions. Experimental results on the two datasets demonstrate the superiority of our ADA model over state-of-the-art methods in recommendation accuracy.
Qiuling Zhang, Huayang Xu, Jianfang Wang
Tuning Query Reformulator with Fine-Grained Relevance Feedback
Abstract
Pseudo-relevance feedback (PRF) has been empirically validated as an effective query reformulation method to improve retrieval performance. Recent studies formulate query reformulation as a reinforcement learning task to directly optimize the retrieval performance. However, this paradigm computes the feedback signals by comparing the retrieved documents with the manual annotations, and neglects that the annotations severely suffer from the unlabeled problem (the relevant documents of a query may not be fully annotated), causing the model to overfit the training set. Moreover, the training of reinforcement learning is expensive and unstable. To address the above problems, inspired by recent great achievements of reinforcement learning from human feedback (RLHF), we propose a simple fine-grained feedback framework for query reformulation, which computes the feedback signals by a powerful re-ranking model instead of manual annotations. Specially, we first utilize various automation methods to generate annotated data, which allows us to initialize the reformulator and obtain a good starting point. Then we employ a re-ranking model to assign fine-grained scores to the rewritten queries generated by the reformulator. Finally, we refine the reformulator using feedback scores. In this way, the knowledge of the re-ranking model can be effectively transferred to the reformulator, leading to a better generalization performance. Furthermore, our framework can enhance performance by leveraging a large amount of unlabeled data. Experiments on a real-world E-Commerce search engine and three public benchmarks demonstrate the effectiveness of our framework.
Yuchen Zhai, Yong Jiang, Yue Zhang, Jianhui Ji, Rong Xiao, Haihong Tang, Chen Li, Pengjun Xie, Yin Zhang
Retrieval-Augmented Document-Level Event Extraction with Cross-Attention Fusion
Abstract
Document-level event extraction intends to extract event records from an entire document. Current approaches adopt an entity-centric workflow, wherein the effectiveness of event extraction heavily relies on the input representation. Nonetheless, the input representations derived from earlier approaches exhibit incongruities when applied to the task of event extraction. To mitigate these discrepancies, we propose a Retrieval-Augmented Document-level Event Extraction (RADEE) method that leverages instances from the training dataset as supplementary event-informed knowledge. Specifically, the most similar training instance containing event records is retrieved and then concatenated with the input to enhance the input representation. To effectively integrate information from retrieved instances while minimizing noise interference, we introduce a fusion layer based on cross-attention mechanism. Experimental results obtained from a comprehensive evaluation of a large-scale document-level event extraction dataset reveal that our proposed method surpasses the performance of all baseline models. Furthermore, our approach exhibits improved performance even in low-resource settings, emphasizing its effectiveness and adaptability.
Yuting Xu, Chong Feng, Bo Wang, Jing Huang, Xinmu Qi
Backmatter
Metadaten
Titel
Social Media Processing
herausgegeben von
Feng Wu
Xuanjing Huang
Xiangnan He
Jiliang Tang
Shu Zhao
Daifeng Li
Jing Zhang
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
Electronic ISBN
978-981-9975-96-9
Print ISBN
978-981-9975-95-2
DOI
https://doi.org/10.1007/978-981-99-7596-9

Premium Partner