recentpopularlog in

tsuomela : data-mining   100

« earlier  
Data Love - The Seduction and Betrayal of Digital Technologies | Columbia University Press
"Intelligence services, government administrations, businesses, and a growing majority of the population are hooked on the idea that big data can reveal patterns and correlations in everyday life. Initiated by software engineers and carried out through algorithms, the mining of big data has sparked a silent revolution. But algorithmic analysis and data mining are not simply byproducts of media development or the logical consequences of computation. They are the radicalization of the Enlightenment's quest for knowledge and progress. Data Love argues that the "cold civil war" of big data is taking place not among citizens or between the citizen and government but within each of us. Roberto Simanowski elaborates on the changes data love has brought to the human condition while exploring the entanglements of those who—out of stinginess, convenience, ignorance, narcissism, or passion—contribute to the amassing of ever more data about their lives, leading to the statistical evaluation and individual profiling of their selves. Writing from a philosophical standpoint, Simanowski illustrates the social implications of technological development and retrieves the concepts, events, and cultural artifacts of past centuries to help decode the programming of our present."
book  publisher  data-science  data-mining  epistemology 
august 2018 by tsuomela
Home - OpenMinTeD
"OpenMinted sets out to create an open, service-oriented ep-Infrastructure for Text and Data Mining (TDM) of scientific and scholarly content. Researchers can collaboratively create, discover, share and re-use Knowledge from a wide range of text-based scientific related sources in a seamless way."
science  research  data-mining  online  scholarly-communication  text-analysis 
may 2016 by tsuomela
AMiner - Open Science Platform
"AMiner (aminer.org) aims to provide comprehensive search and mining services for researcher social networks. In this system, we focus on: (1) creating a semantic-based profile for each researcher by extracting information from the distributed Web; (2) integrating academic data (e.g., the bibliographic data and the researcher profiles) from multiple sources; (3) accurately searching the heterogeneous network; (4) analyzing and discovering interesting patterns from the built researcher social network."
science  research  data-mining  online  scholarly-communication 
may 2016 by tsuomela
Post, Mine, Repeat - Social Media Data Mining | Helen Kennedy | Palgrave Macmillan
"In this book, Helen Kennedy argues that as social media data mining becomes more and more ordinary, as we post, mine and repeat, new data relations emerge. These new data relations are characterised by a widespread desire for numbers and the troubling consequences of this desire, and also by the possibility of doing good with data and resisting data power, by new and old concerns, and by instability and contradiction. Drawing on action research with public sector organisations, interviews with commercial social insights companies and their clients, focus groups with social media users and other research, Kennedy provides a fascinating and detailed account of living with social media data mining inside the organisations that make up the fabric of everyday life."
book  publisher  social-media  data-mining 
may 2016 by tsuomela
rOpenSci - Open Tools for Open Science
"At rOpenSci we are creating packages that allow access to data repositories through the R statistical programming environment that is already a familiar part of the workflow of many scientists. Our tools not only facilitate drawing data into an environment where it can readily be manipulated, but also one in which those analyses and methods can be easily shared, replicated, and extended by other researchers. We develop open source R packages that provide programmatic access to a variety of scientific data, full-text of journal articles, and repositories that provide real-time metrics of scholarly impact. Visit our packages section for a full list of production and development versions of packages."
r  statistics  software  libraries  data-curation  data-mining  data-sources 
august 2014 by tsuomela
Unique in the Crowd: The privacy bounds of human mobility : Scientific Reports : Nature Publishing Group
"We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. In fact, in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier's antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals. We coarsen the data spatially and temporally to find a formula for the uniqueness of human mobility traces given their resolution and the available outside information. This formula shows that the uniqueness of mobility traces decays approximately as the 1/10 power of their resolution. Hence, even coarse datasets provide little anonymity. These findings represent fundamental constraints to an individual's privacy and have important implications for the design of frameworks and institutions dedicated to protect the privacy of individuals."
privacy  data-mining  mobile  mobile-phone  gis  geography  big-data  technology-effects 
june 2013 by tsuomela
Big Data
"Big Data, a highly innovative, open access peer-reviewed journal, provides a unique forum for world-class research exploring the challenges and opportunities in collecting, analyzing, and disseminating vast amounts of data, including data science, big data infrastructure and analytics, and pervasive computing."
journal  open-access  big-data  data-mining  data 
february 2013 by tsuomela
The Database of Intentions | John Battelle's Search BlogJohn Battelle's Search Blog
"The Database of Intentions is simply this: The aggregate results of every search ever entered, every result list ever tendered, and every path taken as a result. It lives in many places, but three or four places in particular hold a massive amount of this data (ie MSN, Google, and Yahoo). This information represents, in aggregate form, a place holder for the intentions of humankind – a massive database of desires, needs, wants, and likes that can be discovered, supoenaed, archived, tracked, and exploited to all sorts of ends. Such a beast has never before existed in the history of culture, but is almost guaranteed to grow exponentially from this day forward. This artifact can tell us extraordinary things about who we are and what we want as a culture. And it has the potential to be abused in equally extraordinary fashion. "
search  big-data  intention  human  digital  traces  data-mining  search-engine  artifact  sts  technology  technology-effects 
october 2012 by tsuomela
Big data is our generation’s civil rights issue, and we don’t know it - O'Reilly Radar
"Data doesn’t invade people’s lives. Lack of control over how it’s used does.

What’s really driving so-called big data isn’t the volume of information. It turns out big data doesn’t have to be all that big. Rather, it’s about a reconsideration of the fundamental economics of analyzing data."
big-data  economics  freedom  privacy  data-mining  control 
august 2012 by tsuomela
Livehoods
"Livehoods offer a new way to conceptualize the dynamics, structure, and character of a city by analyzing the social media its residents generate. By looking at people's checkin patterns at places across the city, we create a mapping of the different dynamic areas that comprise it. Each Livehood tells a different story of the people and places that shape it. "
urban  urbanism  cities  big-data  social-media  data-mining  lifestyle  mapping 
may 2012 by tsuomela
PAKDD 2012
The 16th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) is pleased to organize a data mining competition.
data-mining  competition  practice 
april 2012 by tsuomela
Why DH has no future. | The Stone and the Shell
Let me just say that any area of scholarship where, in 20-fucking-12, the idea of moving to open-access, online distribution of writing counts as some kind of radicalism deserves everything that's going to happen to it.
digital  humanities  academia  data-mining  text-analysis  digital-humanities  open-access  via:cshalizi 
april 2012 by tsuomela
Peak Attention and the Colonization of Subcultures
"The question of how such coded language emerges, spreads and evolves is a big one. I am interested in a very specific question: how do members of an emerging subculture recognize each other in public, especially on the Internet, using more specialized coded language?

The question is interesting because the Web is making traditional subcultures — historically illegible to governance mechanisms, and therefore hotbeds of subversion — increasingly visible and open to cheap, large-scale economic and political exploitation. This exploitation takes the form of attention mining, and is the end-game on the path to what I called Peak Attention a while back.

Does this mean the subversive potential of the Internet is an illusion, and that it will ultimately be domesticated? Possibly." Annotated link http://www.diigo.com/bookmark/http://www.ribbonfarm.com/2012/01/27/peak-attention-and-the-colonization-of-subcultures
internet  culture  subculture  code  code-words  attention  data-mining  social  social-networking  social-media  communication  signals  society  power  government  facebook 
april 2012 by tsuomela
The privacy arc - O'Reilly Radar
Mike Loukides argues that privacy worries are result of persisting attitudes from the 1950s atomization of modern society.
privacy  online  tracking  advertising  culture  data-mining  modernization 
march 2012 by tsuomela
An ethical bargain - O'Reilly Radar
"Okay ... Let me just ask this: If you are involved in data capture, analytics, or customer marketing in your company, would you be embarrassed to admit to your neighbor what about them you capture, store and analyze? Would you be willing to send them a zip file with all of it to let them see it? If the answer is "no," why not? If I might hazard a guess at the answer, it would be because real relationships aren't built on asymmetry, and you know that. But rather than eliminate that awkward source of asymmetry, you hide it."
ethics  business  data-mining  privacy  corporation  asymmetrical  information  information-ethics 
july 2011 by tsuomela
[1103.6038] Searching for comets on the World Wide Web: The orbit of 17P/Holmes from the behavior of photographers
"We performed an image search on Yahoo for "Comet Holmes" on 2010 April 1. Thousands of images were returned. We astrometrically calibrated---and therefore vetted---the images using the Astrometry.net system. The calibrated image pointings form a set of data points to which we can fit a test-particle orbit in the Solar System, marginalizing out image dates and catching outliers. The approach is Bayesian and the model is, in essence, a model of how comet astrophotographers point their instruments. We find very strong probabilistic constraints on the orbit, although slightly off the JPL ephemeris, probably because of limitations of the astronomer model. Hyper-parameters of the model constrain the reliability of date meta-data and where in the image astrophotographers place the comet
science  astronomy  astrophotography  crowdsourcing  data-mining  online  photography  comets  example 
april 2011 by tsuomela
Astronomers Calculate Comet's Orbit Using Amateur Images From The Web - Technology Review
"This sudden brightening triggered a huge wave of interest from astrophotographers all over the world, many of whom posted their images on the web. To find out how many, Dustin Lang from Princeton University in New Jersey and David Hogg at the Max-Planck-Institut fur Astronomie in Heidelberg, Germany, searched the web. They found 2476 different shots of Holmes.

That's a significant astronomical database that represents a huge amount of work. But is it any use?

Today, Lang and Hogg use these images to work out an accurate orbit of Comet 17P/Holmes, a significant achievement given that the data is taken from an ordinary web search and its provenance is entirely unknown."
astronomy  astrophotography  crowdsourcing  data-mining  online  photography  comets  example 
april 2011 by tsuomela
BioCaster Global Health Monitor
Based on a combination of text mining algoithms, BioCaster aims to provide an early warning monitoring station for epidemic and environmental diseases (human, animal and plant). It does this by aggregating online news reports, processing them automatically using human language technology and trying to spot unusual trends. For example, the trend spotting algorithm we use on the top page is CDC's Early Aberration Reporting System (EARS) C2 algorithm. Being able to spot unusual health events still requires skilled human analysts for risk assessment and verification. Automated methods like BioCaster try to make human tasks easier by providing intelligently filtered news.

BioCaster started in 2006 and provides a demonstration portal for public health workers, clinicians and researchers. The portal is currently under development at the National Institute of Informatics, Japan
diseases  machine-learning  data-mining  pandemic  health  monitor  global  natural-language-processing 
april 2010 by tsuomela
Statistical Data Mining Tutorials
The following links point to a set of tutorials on many aspects of statistical data mining, including the foundations of probability, the foundations of statistical data analysis, and most of the classic machine learning and data mining algorithms.
mathematics  tutorial  statistics  data-mining  machine  computer-science 
february 2010 by tsuomela
The Fourth Paradigm: Data-Intensive Scientific Discovery - Microsoft Research
Increasingly, scientific breakthroughs will be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets. The speed at which any given scientific discipline advances will depend on how well its researchers collaborate with one another, and with technologists, in areas of eScience such as databases, workflow management, visualization, and cloud computing technologies. In The Fourth Paradigm: Data-Intensive Scientific Discovery, the collection of essays expands on the vision of pioneering computer scientist Jim Gray for a new, fourth paradigm of discovery based on data-intensive science and offers insights into how it can be fully realized.
science  data  programming  books  data-mining  development  computer-science  discovery  philosophy  future  data-curation  statistics  big-data  computational-science 
january 2010 by tsuomela
Personas | Metropath(ologies) | An installation by Aaron Zinman
Enter your name, and Personas scours the web for information and attempts to characterize the person - to fit them to a predetermined set of categories that an algorithmic process created from a massive corpus of data.
art  media  visualization  internet  identity  data  online  profile  school(MIT)  data-mining 
august 2009 by tsuomela
SDA: Survey Documentation
SDA is a set of programs for the documentation and Web-based analysis of survey data. There are also procedures for creating customized subsets of datasets. This set of programs is developed and maintained by the Computer-assisted Survey Methods Program (CSM) at the University of California, Berkeley.
data-mining  data  archive  survey  sociology  open-data 
december 2008 by tsuomela
Detecting influenza epidemics using search engine query data : Article : Nature
One way to improve early detection is to monitor health-seeking behaviour in the form of queries to online search engines, which are submitted by millions of users around the world each day. Here we present a method of analysing large numbers of Google search queries to track influenza-like illness in a population. Because the relative frequency of certain queries is highly correlated with the percentage of physician visits in which a patient presents with influenza-like symptoms, we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day. This approach may make it possible to use search queries to detect influenza epidemics in areas with a large population of web search users.
epidemics  influenza  health  data-mining  search  searchengine  google 
december 2008 by tsuomela
Public Data Sets on Amazon Web Services (AWS)
Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications.
public-data  open-access  data-mining  data-processing  data-collection  public  commons 
december 2008 by tsuomela
Microsoft Research DataDepot - Home
Welcome to DataDepot, a site that lets you track, analyze, and share trend lines.
data-mining  time-series  sharing  collection  archive  social-computing 
october 2008 by tsuomela
« earlier      
per page:    204080120160

Copy this bookmark:





to read