recentpopularlog in


« earlier   
Structured Generation of Technical Reading Lists
Jonathan Gordon
USC Information Sciences Institute
Marina del Rey, CA, USA
Stephen Aguilar
USC Rossier School of Education
Los Angeles, CA, USA
Emily Sheng
USC Information Sciences Institute
Marina del Rey, CA, USA
Gully Burns
USC Information Sciences Institute
Marina del Rey, CA, USA
yesterday by hustwj
CSCI 582: Computational Journalism
This course is designed to teach application of big data and data science in textual domains, particularly in Journalism and Reporting. The topics include data journalism, natural language processing, visualization, automated fact-checking and story finding, social media sensing and web data analysis. This course will also explore Journalism and Reporting focused open source tools. In a nutshell, this is an ideal course for computer science students who are fascinated with natural language and for journalism students who are enthusiastic about data.
courses  NLP 
yesterday by hustwj
GitHub - RaRe-Technologies/gensim: Topic Modelling for Humans
Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.
python  opensource  nlp 
yesterday by tguemes
Building Brundage Bot – Hacker Noon
I concatenated the title of each paper to its abstract and created tf-idf n-gram features (up to trigrams) from the text. I then concatenated one-hot-encoded vectors representing the paper’s authors and arXiv category. I filtered out n-grams that appeared less than 30 times in the training set (out of ~25k total abstracts) and authors who appeared less than 3 times. This left around 17k total features.

Finally, I held out a randomly-selected 10% of the data as a test set and trained a logistic regression using sklearn. I added L1 regularization (with the parameter chosen by cross-validation) and a class-weighted loss loss to help with the large number of features and class imbalance.
NLP  ML  CNN  example 
2 days ago by foodbaby
tools and knowledge needed to begin anonymizing documents they have written.

It does this by firing up JStylo libraries (an author detection application also develped by PSAL) to detect stylometric patterns and determine features (like word length, bigrams, trigrams, etc.) that the user should remove/add to help obsure their style and identity.
analysis  writing  privacy  github  anonymous  nlp 
3 days ago by sprague

Copy this bookmark:

to read