recentpopularlog in

tsuomela : reproducible   55

Binder (beta)
"Have a repository full of Jupyter notebooks? With Binder, open those notebooks in an executable environment, making your code immediately reproducible by anyone, anywhere. "
data-curation  reproducible  python  ipython  programming  notebook  sharing  research  tool  github 
july 2018 by tsuomela
[1609.00494] Publication bias and the canonization of false facts
"In the process of scientific inquiry, certain claims accumulate enough support to be established as facts. Unfortunately, not every claim accorded the status of fact turns out to be true. In this paper, we model the dynamic process by which claims are canonized as fact through repeated experimental confirmation. The community's confidence in a claim constitutes a Markov process: each successive published result shifts the degree of belief, until sufficient evidence accumulates to accept the claim as fact or to reject it as false. In our model, publication bias --- in which positive results are published preferentially over negative ones --- influences the distribution of published results. We find that when readers do not know the degree of publication bias and thus cannot condition on it, false claims often can be canonized as facts. Unless a sufficient fraction of negative results are published, the scientific process will do a poor job at discriminating false from true claims. This problem is exacerbated when scientists engage in p-hacking, data dredging, and other behaviors that increase the rate at which false positives are published. If negative results become easier to publish as a claim approaches acceptance as a fact, however, true and false claims can be more readily distinguished. To the degree that the model accurately represents current scholarly practice, there will be serious concern about the validity of purported facts in some areas of scientific research. "
publishing  scholarly-communication  bias  facts  reproducible 
november 2017 by tsuomela
Daniele Fanelli's webpages
"I graduated in Natural Sciences, giving exams in all fundamental disciplines, then obtained a PhD studying the behaviour and genetics of social wasps, and subsequently worked for two years as a science writer. Now I study the nature of science itself, and the mis-behaviours of scientists. Professional highlights I am one of the first natural scientists who specialized 24/7 in the study of scientific misconduct, bias and related issues, and have produced some of the largest studies assessing the prevalence of bias across disciplines and countries. Some of these publications have become quite influential, and my 2009 meta-analysis on surveys about misconduct is one of the most popular papers published in the entire Public Library of Science, currently counting over 185,000 views."
people  science  sts  reproducible  fraud  research  ethics 
may 2017 by tsuomela
GitHub - Factual/drake: Data workflow tool, like a "Make for data"
"Drake is a simple-to-use, extensible, text-based data workflow tool that organizes command execution around data and its dependencies. Data processing steps are defined along with their inputs and outputs and Drake automatically resolves their dependencies and calculates: which commands to execute (based on file timestamps) in what order to execute the commands (based on dependencies) Drake is similar to GNU Make, but designed especially for data workflow management. It has HDFS support, allows multiple inputs and outputs, and includes a host of features designed to help you bring sanity to your otherwise chaotic data processing workflows."
data-science  research  automation  scripting  reproducible 
december 2016 by tsuomela
Tools for Reproducible Research
"A minimal standard for data analysis and other scientific computations is that they be reproducible: that the code and data are assembled in a way so that another group can re-create all of the results (e.g., the figures in a paper). The importance of such reproducibility is now widely recognized, but it is still not so widely practiced as it should be, in large part because many computational scientists (and particularly statisticians) have not fully adopted the required tools for reproducible research. In this course, we will discuss general principles for reproducible research but will focus primarily on the use of relevant tools (particularly make, git, and knitr), with the goal that the students leave the course ready and willing to ensure that all aspects of their computational research (software, data analyses, papers, presentations, posters) are reproducible."
courses  open-courseware  research  reproducible  tools  r  statistics 
november 2016 by tsuomela
Developing Data Products… by Brian Caffo et al. [PDF/iPad/Kindle]
"Developing Data Products in R Brian Caffo and Sean Kross This book introduces the topic of Developing Data Products in R. A data product is the ideal output of a Data Science experiment. This book is based on the Coursera Class "Developing Data Products" as part of the Data Science Specialization. Particular emphasis is paid to developing Shiny apps and interactive graphics. "
book  data-science  data  products  publishing  reproducible  research  methods 
november 2016 by tsuomela

Copy this bookmark:





to read