**statistics**

o p t i m u s – turning free-text lists into hierarchical datasets | Data Science Campus

4 hours ago by prcleary

Many datasets contain variables that have been collected as free-text in an uncontrolled way. In the case where this information contains items or short textual descriptions the goal is to aggregate similar entries to analyse the quantities of these that appear within the dataset. This textual information usually requires a significant amount of manual processing which can be impractical for large datasets. The Data Science Campus have developed a processing pipeline which can automatically group and generate hierarchical labels for each group to structure the data. We have written this blog to introduce the pipeline which can be found in our optimus Github repository.

statistics
R
4 hours ago by prcleary

Reproducible Analytical Pipelines - Data in government

4 hours ago by prcleary

Producing official statistics for publications is a key function of many teams across government. It’s a time consuming and meticulous process to ensure that statistics are accurate and timely. With open source software becoming more widely used, there’s now a range of tools and techniques that can be used to reduce production time, whilst maintaining and even improving the quality of the publications. This post is about these techniques: what they are, and how we can use them.

statistics
4 hours ago by prcleary

The 20% Statistician: Justify Your Alpha by Decreasing Alpha Levels as a Function of the Sample Size

4 hours ago by prcleary

Testing whether observed data should surprise us, under the assumption that some model of the data is true, is a widely used procedure in psychological science. Tests against a null model, or against the smallest effect size of interest for an equivalence test, can guide your decisions to continue or abandon research lines. Seeing whether a p-value is smaller than an alpha level is rarely the only thing you want to do, but especially early on in experimental research lines where you can randomly assign participants to conditions, it can be a useful thing.

statistics
4 hours ago by prcleary

kailashahirwar/cheatsheets-ai: Essential Cheat Sheets for deep learning and machine learning researchers

6 hours ago by prcleary

Essential Cheat Sheets for deep learning and machine learning researchers

statistics
6 hours ago by prcleary

Scientists rise up against statistical significance

6 hours ago by basemaly

Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories call for an end to hyped claims and the dismissal of possibly crucial effects.

data
methodology
statistics
science
math
6 hours ago by basemaly

iterative/dvc: ⚡️Data & models versioning for ML projects, make them shareable and reproducible

6 hours ago by prcleary

Data & models versioning for ML projects, make them shareable and reproducible

statistics
6 hours ago by prcleary

Lake Washington Girls Middle School Has Been Invaded by Pirates, Warlocks, and Elves - Features - The Stranger

6 hours ago by basemaly

Dungeons & Dragons has a new class of converts, and they're amassing at Lake Washington Girls Middle School.

learning
education
statistics
D&D
games
math
6 hours ago by basemaly

ninoch/Trend-Simpsons-Paradox

7 hours ago by prcleary

We describe a data-driven discovery method that leverages Simpson's paradox to uncover interesting patterns in behavioral data.

python
statistics
7 hours ago by prcleary

Scientists rise up against statistical significance

9 hours ago by madamim

p-values in the news

statistics
For example, consider a series of analyses of unintended effects of anti-inflammatory drugs2. Because their results were statistically non-significant, one set of researchers concluded that exposure to the drugs was “not associated” with new-onset atrial fibrillation (the most common disturbance to heart rhythm) and that the results stood in contrast to those from an earlier study with a statistically significant outcome.

Now, let’s look at the actual data. The researchers describing their statistically non-significant results found a risk ratio of 1.2 (that is, a 20% greater risk in exposed patients relative to unexposed ones). They also found a 95% confidence interval that spanned everything from a trifling risk decrease of 3% to a considerable risk increase of 48% (P = 0.091; our calculation). The researchers from the earlier, statistically significant, study found the exact same risk ratio of 1.2. That study was simply more precise, with an interval spanning from 9% to 33% greater risk (P = 0.0003; our calculation).

9 hours ago by madamim

Do wearable healthcare devices work? - Big Data, Plainly Spoken (aka Numbers Rule Your World)

10 hours ago by yorksranter

This means that two-thirds got false alarms. But if we include the 70% who were not sent patches after the video consultation as false alarms as well, then out of every 100 warnings, only 7 were validated.

statistics
bayesian
wearables
apple
10 hours ago by yorksranter

vasishth/MScStatisticsNotes: These are cheat sheets and notes I made as part of an MSc in Statistics, at the University of Sheffield, UK.

12 hours ago by prcleary

These are cheat sheets and notes I made as part of an MSc in Statistics, at the University of Sheffield, UK.

statistics
12 hours ago by prcleary

JASP - A Fresh Way to Do Statistics

12 hours ago by ptietjen

Alternative to SPSS

software
statistics
open_source_software
12 hours ago by ptietjen