Please use this identifier to cite or link to this item: http://localhost/handle/Hannan/172274
Title: Mining Coherent Topics With Pre-Learned Interest Knowledge in Twitter
Authors: Yuan He;Cheng Wang;Changjun Jiang
Year: 2017
Publisher: IEEE
Abstract: Discovering semantic coherent topics from the large amount of user-generated content (UGC) in social media would facilitate many downstream applications of intelligent computing. Topic models, as one of the most powerful algorithms, have been widely used to discover the latent semantic patterns in text collections. However, one key weakness of topic models is that they need documents with certain length to provide reliable statistics for generating coherent topics. In Twitter, the users' tweets are mostly short and noisy. Observations of word co-occurrences are incomprehensible for topic models. To deal with this problem, previous work tried to incorporate prior knowledge to obtain better results. However, this strategy is not practical for the fast evolving UGC in Twitter. In this paper, we first cluster the users according to the retweet network, and the users' interests are mined as the prior knowledge. Such data are then applied to improve the performance of topic learning. The potential cause for the effectiveness of this approach is that users in the same community usually share similar interests, which will result in less noisy sub-data sets. Our algorithm pre-learns two types of interest knowledge from the data set: the interest-word-sets and a tweet-interest preference matrix. Furthermore, a dedicated background model is introduced to judge whether a word is drawn from the background noise. Experiments on two real life twitter data sets show that our model achieves significant improvements over state-of-the-art baselines.
Description: 
URI: http://localhost/handle/Hannan/172274
volume: 5
More Information: 10515,
10525
Appears in Collections:2017

Files in This Item:
File SizeFormat 
7941989.pdf2.76 MBAdobe PDF
Title: Mining Coherent Topics With Pre-Learned Interest Knowledge in Twitter
Authors: Yuan He;Cheng Wang;Changjun Jiang
Year: 2017
Publisher: IEEE
Abstract: Discovering semantic coherent topics from the large amount of user-generated content (UGC) in social media would facilitate many downstream applications of intelligent computing. Topic models, as one of the most powerful algorithms, have been widely used to discover the latent semantic patterns in text collections. However, one key weakness of topic models is that they need documents with certain length to provide reliable statistics for generating coherent topics. In Twitter, the users' tweets are mostly short and noisy. Observations of word co-occurrences are incomprehensible for topic models. To deal with this problem, previous work tried to incorporate prior knowledge to obtain better results. However, this strategy is not practical for the fast evolving UGC in Twitter. In this paper, we first cluster the users according to the retweet network, and the users' interests are mined as the prior knowledge. Such data are then applied to improve the performance of topic learning. The potential cause for the effectiveness of this approach is that users in the same community usually share similar interests, which will result in less noisy sub-data sets. Our algorithm pre-learns two types of interest knowledge from the data set: the interest-word-sets and a tweet-interest preference matrix. Furthermore, a dedicated background model is introduced to judge whether a word is drawn from the background noise. Experiments on two real life twitter data sets show that our model achieves significant improvements over state-of-the-art baselines.
Description: 
URI: http://localhost/handle/Hannan/172274
volume: 5
More Information: 10515,
10525
Appears in Collections:2017

Files in This Item:
File SizeFormat 
7941989.pdf2.76 MBAdobe PDF
Title: Mining Coherent Topics With Pre-Learned Interest Knowledge in Twitter
Authors: Yuan He;Cheng Wang;Changjun Jiang
Year: 2017
Publisher: IEEE
Abstract: Discovering semantic coherent topics from the large amount of user-generated content (UGC) in social media would facilitate many downstream applications of intelligent computing. Topic models, as one of the most powerful algorithms, have been widely used to discover the latent semantic patterns in text collections. However, one key weakness of topic models is that they need documents with certain length to provide reliable statistics for generating coherent topics. In Twitter, the users' tweets are mostly short and noisy. Observations of word co-occurrences are incomprehensible for topic models. To deal with this problem, previous work tried to incorporate prior knowledge to obtain better results. However, this strategy is not practical for the fast evolving UGC in Twitter. In this paper, we first cluster the users according to the retweet network, and the users' interests are mined as the prior knowledge. Such data are then applied to improve the performance of topic learning. The potential cause for the effectiveness of this approach is that users in the same community usually share similar interests, which will result in less noisy sub-data sets. Our algorithm pre-learns two types of interest knowledge from the data set: the interest-word-sets and a tweet-interest preference matrix. Furthermore, a dedicated background model is introduced to judge whether a word is drawn from the background noise. Experiments on two real life twitter data sets show that our model achieves significant improvements over state-of-the-art baselines.
Description: 
URI: http://localhost/handle/Hannan/172274
volume: 5
More Information: 10515,
10525
Appears in Collections:2017

Files in This Item:
File SizeFormat 
7941989.pdf2.76 MBAdobe PDF