recentpopularlog in

tsuomela : text-analysis   90

« earlier  
"text2vec is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP)."
r  package  text-mining  text-analysis 
26 days ago by tsuomela
Analyzing Documents with TF-IDF | Programming Historian
"This lesson focuses on a foundational natural language processing and information retrieval method called Term Frequency - Inverse Document Frequency (tf-idf). This lesson explores the foundations of tf-idf, and will also introduce you to some of the questions and concepts of computationally oriented text analysis."
text-mining  python  digital-humanities  text-analysis  tutorial 
26 days ago by tsuomela
Visualize any Text as a Network - Textexture
"Welcome to Textexture. Using this tool you can visualize any text as a network. The resulting graph can be used to get a quick visual summary of the text, read the most relevant excerpts (by clicking on the nodes), and find similar texts."
text-analysis  visualization  online  tool 
april 2017 by tsuomela
CRAN Task View: Natural Language Processing
"Natural language processing has come a long way since its foundations were laid in the 1940s and 50s (for an introduction see, e.g., Jurafsky and Martin (2008): Speech and Language Processing, Pearson Prentice Hall). This CRAN task view collects relevant R packages that support computational linguists in conducting analysis of speech and language on a variety of levels - setting focus on words, syntax, semantics, and pragmatics. In recent years, we have elaborated a framework to be used in packages dealing with the processing of written material: the package tm. Extension packages in this area are highly recommended to interface with tm's basic routines and useRs are cordially invited to join in the discussion on further developments of this framework package. To get into natural language processing, the cRunch service and tutorials may be helpful. "
r  statistics  natural-language-processing  text-analysis  tools  package  programming  reference  documentation 
november 2016 by tsuomela
Natural Language Toolkit — NLTK 3.0 documentation
"NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum. "
python  library  language  analysis  text-analysis 
june 2016 by tsuomela
Facebook says its new AI can understand text with 'near-human accuracy'
"Facebook is using its latest AI project to get a lot smarter at understanding text. In fact, the social network says DeepText, its new "text understanding engine," is so good, it can interpret "several thousands posts a second" with "near-human accuracy." Introduced Wednesday, DeepText offers an intriguing look into how Facebook is using artificial intelligence to make its platform better at parsing the billions of lines of text that pass through it each day. "
text-analysis  deep-learning  machine-learning  facebook  digital-humanities 
june 2016 by tsuomela
Home - OpenMinTeD
"OpenMinted sets out to create an open, service-oriented ep-Infrastructure for Text and Data Mining (TDM) of scientific and scholarly content. Researchers can collaboratively create, discover, share and re-use Knowledge from a wide range of text-based scientific related sources in a seamless way."
science  research  data-mining  online  scholarly-communication  text-analysis 
may 2016 by tsuomela
Hermeneutica | The MIT Press
"The image of the scholar as a solitary thinker dates back at least to Descartes’ Discourse on Method. But scholarly practices in the humanities are changing as older forms of communal inquiry are combined with modern research methods enabled by the Internet, accessible computing, data availability, and new media. Hermeneutica introduces text analysis using computer-assisted interpretive practices. It offers theoretical chapters about text analysis, presents a set of analytical tools (called Voyant) that instantiate the theory, and provides example essays that illustrate the use of these tools. Voyant allows users to integrate interpretation into texts by creating hermeneutica—small embeddable “toys” that can be woven into essays published online or into such online writing environments as blogs or wikis. The book’s companion website,, offers the example essays with both text and embedded interactive panels. The panels show results and allow readers to experiment with the toys themselves. The use of these analytical tools results in a hybrid essay: an interpretive work embedded with hermeneutical toys that can be explored for technique. The hermeneutica draw on and develop such common interactive analytics as word clouds and complex data journalism interactives. Embedded in scholarly texts, they create a more engaging argument. Moving between tool and text becomes another thread in a dynamic dialogue."
book  publisher  digital-humanities  text-analysis 
march 2016 by tsuomela
Welcome // | DiRT Directory
"The DiRT Directory is a registry of digital research tools for scholarly use. DiRT makes it easy for digital humanists and others conducting digital research to find and compare resources ranging from content management systems to music OCR, statistical analysis packages to mindmapping software."
research  tools  directory  catalog  digital-humanities  digital  text-analysis  humanities 
july 2015 by tsuomela
Wmatrix corpus analysis and comparison tool
"Wmatrix is a software tool for corpus analysis and comparison. It provides a web interface to the USAS and CLAWS corpus annotation tools, and standard corpus linguistic methodologies such as frequency lists and concordances. It also extends the keywords method to key grammatical categories and key semantic domains."
software  tool  text-analysis 
november 2014 by tsuomela
PLOS ONE: Linguistic Traces of a Scientific Fraud: The Case of Diederik Stapel
"When scientists report false data, does their writing style reflect their deception? In this study, we investigated the linguistic patterns of fraudulent (N = 24; 170,008 words) and genuine publications (N = 25; 189,705 words) first-authored by social psychologist Diederik Stapel. The analysis revealed that Stapel's fraudulent papers contained linguistic changes in science-related discourse dimensions, including more terms pertaining to methods, investigation, and certainty than his genuine papers. His writing style also matched patterns in other deceptive language, including fewer adjectives in fraudulent publications relative to genuine publications. Using differences in language dimensions we were able to classify Stapel's publications with above chance accuracy. Beyond these discourse dimensions, Stapel included fewer co-authors when reporting fake data than genuine data, although other evidentiary claims (e.g., number of references and experiments) did not differ across the two article types. This research supports recent findings that language cues vary systematically with deception, and that deception can be revealed in fraudulent scientific discourse."
science  fraud  text-analysis  linguistics  language  detection 
november 2014 by tsuomela
The Overview Project — Visualize your documents
"Read and analyze thousands of documents super quickly. Full text search, topic modeling, coding and tagging, visualizations and more. All in an easy-to use, visual workflow."
journalism  technology  computers  text-analysis  digital-humanities  media 
october 2014 by tsuomela
"Meandre provides the machinery for assembling and executing data flows -software applications consisting of software components that process data (such as by accessing a data store, transforming the data from that store and analyzing or visualizing the transformed results). Within Meandre, each flow is represented as a graph that shows executable components (i.e., basic computational units, or building blocks) as icons linked through their input and output connections. Based on the inputs and properties of a executable component, a unique output is generated upon execution."
digital-humanities  software  text-analysis  tools 
september 2014 by tsuomela
Text Analysis with Topic Models for the Humanities and Social Sciences — Text Analysis with Topic Models for the Humanities and Social Sciences
"Text Analysis with Topic Models for the Humanities and Social Sciences (TAToM) consists of a series of tutorials covering basic procedures in quantitative text analysis. The tutorials cover the preparation of a text corpus for analysis and the exploration of a collection of texts using topic models and machine learning."
text-analysis  topics  digital-humanities  modeling  semantics 
august 2014 by tsuomela – The Rhetoric of Text Analysis
" is a collaborative project by Stéfan Sinclair & Geoffrey Rockwell to think through some foundations of contemporary text analysis, including issues related to the electronic texts used, the tools and methodologies available, and the various forms that can take the expression of results from text analysis."
text-analysis  humanities  computing  text-processing  tools 
july 2014 by tsuomela
"TEI By Example offers a series of freely available online tutorials walking individuals through the different stages in marking up a document in TEI (Text Encoding Initiative). "
text-markup  text-analysis  xml 
september 2013 by tsuomela
Meandre is a semantic-web-driven data-intensive flow execution environment. Meandre provides basic infrastructure for data-intensive computation. It provides, among others, tools for creating components and flows, a high-level language to describe flows, and multicore and distributed execution environment based on a service-oriented paradigm.
software  text-analysis  semantic-web  server 
july 2012 by tsuomela
Why DH has no future. | The Stone and the Shell
Let me just say that any area of scholarship where, in 20-fucking-12, the idea of moving to open-access, online distribution of writing counts as some kind of radicalism deserves everything that's going to happen to it.
digital  humanities  academia  data-mining  text-analysis  digital-humanities  open-access  via:cshalizi 
april 2012 by tsuomela
The Mentaculus: The Seven Most Discussed Scientific Biases
confounding, selection, publication, response, attention, recall, sampling
science  bias  psychology  text-analysis 
april 2011 by tsuomela
[1007.3254] Distinguishing Fact from Fiction: Pattern Recognition in Texts Using Complex Networks
We establish concrete mathematical criteria to distinguish between different kinds of written storytelling, fictional and non-fictional. Specifically, we constructed a semantic network from both novels and news stories, with $N$ independent words as vertices or nodes, and edges or links allotted to words occurring within $m$ places of a given vertex; we call $m$ the word distance. We then used measures from complex network theory to distinguish between news and fiction, studying the minimal text length needed as well as the optimized word distance $m$.
semantics  fiction  computer  computer-science  network-analysis  literature  text-analysis 
july 2010 by tsuomela
Designing Text Ecologies (Designing Text Ecologies)
Clay Spinuzzi, University of Texas at Austin
This site contains resources for my upper-division course RHE 330c, Designing Text Ecologies.
rhetoric  course  syllabi  research  methods  text-analysis  sociology  observation 
april 2010 by tsuomela
« earlier      
per page:    204080120160

Copy this bookmark:

to read