recentpopularlog in


« earlier   
o p t i m u s – turning free-text lists into hierarchical datasets | Data Science Campus
Many datasets contain variables that have been collected as free-text in an uncontrolled way. In the case where this information contains items or short textual descriptions the goal is to aggregate similar entries to analyse the quantities of these that appear within the dataset. This textual information usually requires a significant amount of manual processing which can be impractical for large datasets. The Data Science Campus have developed a processing pipeline which can automatically group and generate hierarchical labels for each group to structure the data. We have written this blog to introduce the pipeline which can be found in our optimus Github repository.
statistics  R 
4 hours ago by prcleary
Reproducible Analytical Pipelines - Data in government
Producing official statistics for publications is a key function of many teams across government. It’s a time consuming and meticulous process to ensure that statistics are accurate and timely. With open source software becoming more widely used, there’s now a range of tools and techniques that can be used to reduce production time, whilst maintaining and even improving the quality of the publications. This post is about these techniques: what they are, and how we can use them.
4 hours ago by prcleary
The 20% Statistician: Justify Your Alpha by Decreasing Alpha Levels as a Function of the Sample Size
Testing whether observed data should surprise us, under the assumption that some model of the data is true, is a widely used procedure in psychological science. Tests against a null model, or against the smallest effect size of interest for an equivalence test, can guide your decisions to continue or abandon research lines. Seeing whether a p-value is smaller than an alpha level is rarely the only thing you want to do, but especially early on in experimental research lines where you can randomly assign participants to conditions, it can be a useful thing.
4 hours ago by prcleary
Scientists rise up against statistical significance
Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories call for an end to hyped claims and the dismissal of possibly crucial effects.
data  methodology  statistics  science  math 
6 hours ago by basemaly
Lake Washington Girls Middle School Has Been Invaded by Pirates, Warlocks, and Elves - Features - The Stranger
Dungeons & Dragons has a new class of converts, and they're amassing at Lake Washington Girls Middle School.
learning  education  statistics  D&D  games  math 
6 hours ago by basemaly
We describe a data-driven discovery method that leverages Simpson's paradox to uncover interesting patterns in behavioral data.
python  statistics 
7 hours ago by prcleary
Scientists rise up against statistical significance
p-values in the news
For example, consider a series of analyses of unintended effects of anti-inflammatory drugs2. Because their results were statistically non-significant, one set of researchers concluded that exposure to the drugs was “not associated” with new-onset atrial fibrillation (the most common disturbance to heart rhythm) and that the results stood in contrast to those from an earlier study with a statistically significant outcome.

Now, let’s look at the actual data. The researchers describing their statistically non-significant results found a risk ratio of 1.2 (that is, a 20% greater risk in exposed patients relative to unexposed ones). They also found a 95% confidence interval that spanned everything from a trifling risk decrease of 3% to a considerable risk increase of 48% (P = 0.091; our calculation). The researchers from the earlier, statistically significant, study found the exact same risk ratio of 1.2. That study was simply more precise, with an interval spanning from 9% to 33% greater risk (P = 0.0003; our calculation).
9 hours ago by madamim
Do wearable healthcare devices work? - Big Data, Plainly Spoken (aka Numbers Rule Your World)
This means that two-thirds got false alarms. But if we include the 70% who were not sent patches after the video consultation as false alarms as well, then out of every 100 warnings, only 7 were validated.
statistics  bayesian  wearables  apple 
10 hours ago by yorksranter
vasishth/MScStatisticsNotes: These are cheat sheets and notes I made as part of an MSc in Statistics, at the University of Sheffield, UK.
These are cheat sheets and notes I made as part of an MSc in Statistics, at the University of Sheffield, UK.
12 hours ago by prcleary

Copy this bookmark:

to read