Please use this identifier to cite or link to this item: http://localhost/handle/Hannan/655254
Title: Microblog Dimensionality Reduction&x2014;A Deep Learning Approach
Authors: Lei Xu;Chunxiao Jiang;Yong Ren;Hsiao-Hwa Chen
subject: Microblog mining|Dimension reduction|Deep autoencoder|Text representation|Semantic relatedness
Year: 2016
Publisher: IEEE
Abstract: Exploring potentially useful information from huge amount of textual data produced by microblogging services has attracted much attention in recent years. An important preprocessing step of microblog text mining is to convert natural language texts into proper numerical representations. Due to the short-length characteristics of microblog texts, using term frequency vectors to represent microblog texts will cause “sparse data” problem. Finding proper representations of microblog texts is a challenging issue. In this paper, we apply deep networks to map the high-dimensional representations of microblog texts to low-dimensional representations. To improve the result of dimensionality reduction, we take advantage of the semantic similarity derived from two types of microblogspecific information, namely the retweet relationship and hashtags. Two types of approaches, including modifying training data and modifying the training objective of deep networks, are proposed to make use of microblog-specific information. Experiment results show that the deep models perform better than traditional dimensionality reduction methods such as latent semantic analysis and latent Dirichlet allocation topic model, and the use of microblog-specific information can help to learn better representations.
Description: 
URI: http://localhost/handle/Hannan/142416
http://localhost/handle/Hannan/655254
ISSN: 1041-4347
volume: 28
issue: 7
Appears in Collections:2016

Files in This Item:
File Description SizeFormat 
7430292.pdf649.54 kBAdobe PDFThumbnail
Preview File
Title: Microblog Dimensionality Reduction&x2014;A Deep Learning Approach
Authors: Lei Xu;Chunxiao Jiang;Yong Ren;Hsiao-Hwa Chen
subject: Microblog mining|Dimension reduction|Deep autoencoder|Text representation|Semantic relatedness
Year: 2016
Publisher: IEEE
Abstract: Exploring potentially useful information from huge amount of textual data produced by microblogging services has attracted much attention in recent years. An important preprocessing step of microblog text mining is to convert natural language texts into proper numerical representations. Due to the short-length characteristics of microblog texts, using term frequency vectors to represent microblog texts will cause “sparse data” problem. Finding proper representations of microblog texts is a challenging issue. In this paper, we apply deep networks to map the high-dimensional representations of microblog texts to low-dimensional representations. To improve the result of dimensionality reduction, we take advantage of the semantic similarity derived from two types of microblogspecific information, namely the retweet relationship and hashtags. Two types of approaches, including modifying training data and modifying the training objective of deep networks, are proposed to make use of microblog-specific information. Experiment results show that the deep models perform better than traditional dimensionality reduction methods such as latent semantic analysis and latent Dirichlet allocation topic model, and the use of microblog-specific information can help to learn better representations.
Description: 
URI: http://localhost/handle/Hannan/142416
http://localhost/handle/Hannan/655254
ISSN: 1041-4347
volume: 28
issue: 7
Appears in Collections:2016

Files in This Item:
File Description SizeFormat 
7430292.pdf649.54 kBAdobe PDFThumbnail
Preview File
Title: Microblog Dimensionality Reduction&x2014;A Deep Learning Approach
Authors: Lei Xu;Chunxiao Jiang;Yong Ren;Hsiao-Hwa Chen
subject: Microblog mining|Dimension reduction|Deep autoencoder|Text representation|Semantic relatedness
Year: 2016
Publisher: IEEE
Abstract: Exploring potentially useful information from huge amount of textual data produced by microblogging services has attracted much attention in recent years. An important preprocessing step of microblog text mining is to convert natural language texts into proper numerical representations. Due to the short-length characteristics of microblog texts, using term frequency vectors to represent microblog texts will cause “sparse data” problem. Finding proper representations of microblog texts is a challenging issue. In this paper, we apply deep networks to map the high-dimensional representations of microblog texts to low-dimensional representations. To improve the result of dimensionality reduction, we take advantage of the semantic similarity derived from two types of microblogspecific information, namely the retweet relationship and hashtags. Two types of approaches, including modifying training data and modifying the training objective of deep networks, are proposed to make use of microblog-specific information. Experiment results show that the deep models perform better than traditional dimensionality reduction methods such as latent semantic analysis and latent Dirichlet allocation topic model, and the use of microblog-specific information can help to learn better representations.
Description: 
URI: http://localhost/handle/Hannan/142416
http://localhost/handle/Hannan/655254
ISSN: 1041-4347
volume: 28
issue: 7
Appears in Collections:2016

Files in This Item:
File Description SizeFormat 
7430292.pdf649.54 kBAdobe PDFThumbnail
Preview File