recentpopularlog in

tsuomela : topic-modeling   13

Topic Modeling in Python with NLTK and Gensim | DataScience+
"In this post, we will learn how to identify which topic is discussed in a document, called topic modeling. In particular, we will cover Latent Dirichlet Allocation (LDA): a widely used topic modelling technique. And we will apply LDA to convert set of research papers to a set of topics."
python  statistics  topic-modeling  digital-humanities  methods 
april 2018 by tsuomela
Ways to Compute Topics over Time, Part 1 · from data to scholarship
"This the first in a series of posts which constitute a “lit review” of sorts to document the range of methods scholars are using to compute the distribution of topics over time."
digital-humanities  topic-modeling  methods  tutorial  temporal 
june 2017 by tsuomela
[1511.03020] Co-word Maps and Topic Modeling: A Comparison from a User's Perspective
"Induced by "big data," "topic modeling" has become an attractive alternative to mapping co-words in terms of co-occurrences and co-absences using network techniques. We return to the word/document matrix using first a single text with a strong argument ("The Leiden Manifesto") and then upscale to a sample of moderate size (n = 687) to study the pros and cons of the two approaches in terms of the resulting possibilities for making semantic maps that can serve an argument. The results from co-word mapping (using two different routines) versus topic modeling are significantly uncorrelated. Whereas components in the co-word maps can easily be designated, the coloring of the nodes according to the results of the topic model provides maps that are difficult to interpret. In these samples, the topic models seem to reveal similarities other than semantic ones (e.g., linguistic ones). In other words, topic modeling does not replace co-word mapping."
paper  research  topic-modeling  mapping  literature 
november 2015 by tsuomela
Cohen
"Numerous papers have reported great success at inferring the political orientation of Twitter users. This paper has some unfortunate news to deliver: while past work has been sound and often methodologically novel, we have discovered that reported accuracies have been systemically overoptimistic due to the way in which validation datasets have been collected, reporting accuracy levels nearly 30% higher than can be expected in populations of general Twitter users. Using careful and novel data collection and annotation techniques, we collected three different sets of Twitter users, each characterizing a different degree of political engagement on Twitter - from politicians (highly politically vocal) to "normal" users (those who rarely discuss politics). Applying standard techniques for inferring political orientation, we show that methods which previously reported greater than 90% inference accuracy, actually achieve barely 65% accuracy on normal users. We also show that classifiers cannot be used to classify users outside the narrow range of political orientation on which they were trained. While a sobering finding, our results quantify and call attention to overlooked problems in the latent attribute inference literature that, no doubt, extend beyond political orientation inference: the way in which datasets are assembled and the transferability of classifiers."
conference  social-media  politics  classification  topic-modeling 
december 2014 by tsuomela

Copy this bookmark:





to read