**nhaliday : hypothesis-testing**
106

Thread by @docmilanfar: (1/5) One of the most surprising and little-known results in classical statistics is the relationship between the mean, median, and standard…

2 days ago by nhaliday

(1/5) One of the most surprising and little-known results in classical statistics is the relationship between the mean, median, and standard deviation. If the distribution has finite variance, then the distance between the median and the mean is bounded by one standard deviation.

twitter
social
discussion
levers
tidbits
math
probability
stats
mental-math
calculation
applications
meta-analysis
hypothesis-testing
science
moments
expectancy
proofs
convexity-curvature
estimate
2 days ago by nhaliday

Redshift sleep experiment - Gwern.net

december 2019 by nhaliday

Redshift does influence my sleep.

One belief - that Redshift helped avoid bright light retarding the sleep cycle and enabling going to bed early - was borne: on Redshift days, I went to bed an average of 19 minutes earlier. (I had noticed this in my earliest Redshift usage in 2008 and noticed during the experiment that I seemed to be staying up pretty late some nights.) Since I value having a sleep schedule more like that of the rest of humanity and not sleeping past noon, this justifies keeping Redshift installed.

But I am also surprised at the lack of effect on the other aspects of sleep; I was sure Redshift would lead to improvements in waking and how I felt in the morning, if nothing else. Yet, while the exact effect tends to be better for the most important variables, the effect estimates are relatively trivial (less than a tenth increase in average morning feel? falling asleep 2 minutes faster?) and several are worse - I’m a bit baffled why deep sleep decreased, but it might be due to the lower total sleep.

So it seems Redshift is excellent for shifting my bedtime forward, but I can’t say it does much else.

ratty
gwern
data
analysis
intervention
effect-size
lifehack
quantified-self
hypothesis-testing
experiment
sleep
rhythm
null-result
software
desktop
One belief - that Redshift helped avoid bright light retarding the sleep cycle and enabling going to bed early - was borne: on Redshift days, I went to bed an average of 19 minutes earlier. (I had noticed this in my earliest Redshift usage in 2008 and noticed during the experiment that I seemed to be staying up pretty late some nights.) Since I value having a sleep schedule more like that of the rest of humanity and not sleeping past noon, this justifies keeping Redshift installed.

But I am also surprised at the lack of effect on the other aspects of sleep; I was sure Redshift would lead to improvements in waking and how I felt in the morning, if nothing else. Yet, while the exact effect tends to be better for the most important variables, the effect estimates are relatively trivial (less than a tenth increase in average morning feel? falling asleep 2 minutes faster?) and several are worse - I’m a bit baffled why deep sleep decreased, but it might be due to the lower total sleep.

So it seems Redshift is excellent for shifting my bedtime forward, but I can’t say it does much else.

december 2019 by nhaliday

Ask HN: What's your speciality, and what's your "FizzBuzz" equivalent? | Hacker News

hn discussion q-n-a tech programming recruiting checking short-circuit analogy lens init ground-up interdisciplinary cs IEEE electromag math probability finance ORFE marketing dbs audio writing data-science stats hypothesis-testing devops debugging security networking web frontend javascript chemistry gedanken examples fourier acm linear-algebra matrix-factorization iterative-methods embedded multi human-capital

november 2019 by nhaliday

hn discussion q-n-a tech programming recruiting checking short-circuit analogy lens init ground-up interdisciplinary cs IEEE electromag math probability finance ORFE marketing dbs audio writing data-science stats hypothesis-testing devops debugging security networking web frontend javascript chemistry gedanken examples fourier acm linear-algebra matrix-factorization iterative-methods embedded multi human-capital

november 2019 by nhaliday

"Performance Matters" by Emery Berger - YouTube

october 2019 by nhaliday

Stabilizer is a tool that enables statistically sound performance evaluation, making it possible to understand the impact of optimizations and conclude things like the fact that the -O2 and -O3 optimization levels are indistinguishable from noise (sadly true).

Since compiler optimizations have run out of steam, we need better profiling support, especially for modern concurrent, multi-threaded applications. Coz is a new "causal profiler" that lets programmers optimize for throughput or latency, and which pinpoints and accurately predicts the impact of optimizations.

- randomize extraneous factors like code layout and stack size to avoid spurious speedups

- simulate speedup of component of concurrent system (to assess effect of optimization before attempting) by slowing down the complement (all but that component)

- latency vs. throughput, Little's law

video
presentation
programming
engineering
nitty-gritty
performance
devtools
compilers
latency-throughput
concurrency
legacy
causation
wire-guided
let-me-see
manifolds
pro-rata
tricks
endogenous-exogenous
control
random
signal-noise
comparison
marginal
llvm
systems
hashing
computer-memory
build-packaging
composition-decomposition
coupling-cohesion
local-global
dbs
direct-indirect
symmetry
research
models
metal-to-virtual
linux
measurement
simulation
magnitude
realness
hypothesis-testing
techtariat
Since compiler optimizations have run out of steam, we need better profiling support, especially for modern concurrent, multi-threaded applications. Coz is a new "causal profiler" that lets programmers optimize for throughput or latency, and which pinpoints and accurately predicts the impact of optimizations.

- randomize extraneous factors like code layout and stack size to avoid spurious speedups

- simulate speedup of component of concurrent system (to assess effect of optimization before attempting) by slowing down the complement (all but that component)

- latency vs. throughput, Little's law

october 2019 by nhaliday

Treadmill desk observations - Gwern.net

august 2019 by nhaliday

Notes relating to my use of a treadmill desk and 2 self-experiments showing walking treadmill use interferes with typing and memory performance.

...

While the result seems highly likely to be true for me, I don’t know how well it might generalize to other people. For example, perhaps more fit people can use a treadmill without harm and the negative effect is due to the treadmill usage tiring & distracting me; I try to walk 2 miles a day, but that’s not much compared to some people.

Given this harmful impact, I will avoid doing spaced repetition on my treadmill in the future, and given this & the typing result, will relegate any computer+treadmill usage to non-intellectually-demanding work like watching movies. This turned out to not be a niche use I cared about and I hardly ever used my treadmill afterwards, so in October 2016 I sold my treadmill for $70. I might investigate standing desks next for providing some exercise beyond sitting but without the distracting movement of walking on a treadmill.

ratty
gwern
data
analysis
quantified-self
health
fitness
get-fit
working-stiff
intervention
cost-benefit
psychology
cog-psych
retention
iq
branches
keyboard
ergo
efficiency
accuracy
null-result
increase-decrease
experiment
hypothesis-testing
...

While the result seems highly likely to be true for me, I don’t know how well it might generalize to other people. For example, perhaps more fit people can use a treadmill without harm and the negative effect is due to the treadmill usage tiring & distracting me; I try to walk 2 miles a day, but that’s not much compared to some people.

Given this harmful impact, I will avoid doing spaced repetition on my treadmill in the future, and given this & the typing result, will relegate any computer+treadmill usage to non-intellectually-demanding work like watching movies. This turned out to not be a niche use I cared about and I hardly ever used my treadmill afterwards, so in October 2016 I sold my treadmill for $70. I might investigate standing desks next for providing some exercise beyond sitting but without the distracting movement of walking on a treadmill.

august 2019 by nhaliday

An Untrollable Mathematician Illustrated

ratty lesswrong comics infographic ai-control ai thinking skeleton miri-cfar big-picture synthesis hi-order-bits interdisciplinary lens logic iteration-recursion probability decision-theory decision-making values flux-stasis formal-values bayesian axioms cs computation math truth uncertainty finiteness nibble cartoons visual-understanding machine-learning troll internet volo-avolo hypothesis-testing telos-atelos inference apollonian-dionysian

april 2018 by nhaliday

ratty lesswrong comics infographic ai-control ai thinking skeleton miri-cfar big-picture synthesis hi-order-bits interdisciplinary lens logic iteration-recursion probability decision-theory decision-making values flux-stasis formal-values bayesian axioms cs computation math truth uncertainty finiteness nibble cartoons visual-understanding machine-learning troll internet volo-avolo hypothesis-testing telos-atelos inference apollonian-dionysian

april 2018 by nhaliday

The Gelman View – spottedtoad

november 2017 by nhaliday

I have read Andrew Gelman’s blog for about five years, and gradually, I’ve decided that among his many blog posts and hundreds of academic articles, he is advancing a philosophy not just of statistics but of quantitative social science in general. Not a statistician myself, here is how I would articulate the Gelman View:

A. Purposes

1. The purpose of social statistics is to describe and understand variation in the world. The world is a complicated place, and we shouldn’t expect things to be simple.

2. The purpose of scientific publication is to allow for communication, dialogue, and critique, not to “certify” a specific finding as absolute truth.

3. The incentive structure of science needs to reward attempts to independently investigate, reproduce, and refute existing claims and observed patterns, not just to advance new hypotheses or support a particular research agenda.

B. Approach

1. Because the world is complicated, the most valuable statistical models for the world will generally be complicated. The result of statistical investigations will only rarely be to give a stamp of truth on a specific effect or causal claim, but will generally show variation in effects and outcomes.

2. Whenever possible, the data, analytic approach, and methods should be made as transparent and replicable as possible, and should be fair game for anyone to examine, critique, or amend.

3. Social scientists should look to build upon a broad shared body of knowledge, not to “own” a particular intervention, theoretic framework, or technique. Such ownership creates incentive problems when the intervention, framework, or technique fail and the scientist is left trying to support a flawed structure.

Components

1. Measurement. How and what we measure is the first question, well before we decide on what the effects are or what is making that measurement change.

2. Sampling. Who we talk to or collect information from always matters, because we should always expect effects to depend on context.

3. Inference. While models should usually be complex, our inferential framework should be simple enough for anyone to follow along. And no p values.

He might disagree with all of this, or how it reflects his understanding of his own work. But I think it is a valuable guide to empirical work.

ratty
unaffiliated
summary
gelman
scitariat
philosophy
lens
stats
hypothesis-testing
science
meta:science
social-science
institutions
truth
is-ought
best-practices
data-science
info-dynamics
alt-inst
academia
empirical
evidence-based
checklists
strategy
epistemic
A. Purposes

1. The purpose of social statistics is to describe and understand variation in the world. The world is a complicated place, and we shouldn’t expect things to be simple.

2. The purpose of scientific publication is to allow for communication, dialogue, and critique, not to “certify” a specific finding as absolute truth.

3. The incentive structure of science needs to reward attempts to independently investigate, reproduce, and refute existing claims and observed patterns, not just to advance new hypotheses or support a particular research agenda.

B. Approach

1. Because the world is complicated, the most valuable statistical models for the world will generally be complicated. The result of statistical investigations will only rarely be to give a stamp of truth on a specific effect or causal claim, but will generally show variation in effects and outcomes.

2. Whenever possible, the data, analytic approach, and methods should be made as transparent and replicable as possible, and should be fair game for anyone to examine, critique, or amend.

3. Social scientists should look to build upon a broad shared body of knowledge, not to “own” a particular intervention, theoretic framework, or technique. Such ownership creates incentive problems when the intervention, framework, or technique fail and the scientist is left trying to support a flawed structure.

Components

1. Measurement. How and what we measure is the first question, well before we decide on what the effects are or what is making that measurement change.

2. Sampling. Who we talk to or collect information from always matters, because we should always expect effects to depend on context.

3. Inference. While models should usually be complex, our inferential framework should be simple enough for anyone to follow along. And no p values.

He might disagree with all of this, or how it reflects his understanding of his own work. But I think it is a valuable guide to empirical work.

november 2017 by nhaliday

Use and Interpretation of LD Score Regression

november 2017 by nhaliday

LD Score regression distinguishes confounding from polygenicity in genome-wide association studies: https://sci-hub.bz/10.1038/ng.3211

- Po-Ru Loh, Nick Patterson, et al.

https://www.biorxiv.org/content/biorxiv/early/2014/02/21/002931.full.pdf

Both polygenicity (i.e. many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield inflated distributions of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from bias and true signal from polygenicity. We have developed an approach that quantifies the contributions of each by examining the relationship between test statistics and linkage disequilibrium (LD). We term this approach LD Score regression. LD Score regression provides an upper bound on the contribution of confounding bias to the observed inflation in test statistics and can be used to estimate a more powerful correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of test statistic inflation in many GWAS of large sample size.

Supplementary Note: https://images.nature.com/original/nature-assets/ng/journal/v47/n3/extref/ng.3211-S1.pdf

An atlas of genetic correlations across human diseases

and traits: https://sci-hub.bz/10.1038/ng.3406

https://www.biorxiv.org/content/early/2015/01/27/014498.full.pdf

Supplementary Note: https://images.nature.com/original/nature-assets/ng/journal/v47/n11/extref/ng.3406-S1.pdf

https://github.com/bulik/ldsc

ldsc is a command line tool for estimating heritability and genetic correlation from GWAS summary statistics. ldsc also computes LD Scores.

nibble
pdf
slides
talks
bio
biodet
genetics
genomics
GWAS
genetic-correlation
correlation
methodology
bioinformatics
concept
levers
🌞
tutorial
explanation
pop-structure
gene-drift
ideas
multi
study
org:nat
article
repo
software
tools
libraries
stats
hypothesis-testing
biases
confounding
gotchas
QTL
simulation
survey
preprint
population-genetics
- Po-Ru Loh, Nick Patterson, et al.

https://www.biorxiv.org/content/biorxiv/early/2014/02/21/002931.full.pdf

Both polygenicity (i.e. many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield inflated distributions of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from bias and true signal from polygenicity. We have developed an approach that quantifies the contributions of each by examining the relationship between test statistics and linkage disequilibrium (LD). We term this approach LD Score regression. LD Score regression provides an upper bound on the contribution of confounding bias to the observed inflation in test statistics and can be used to estimate a more powerful correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of test statistic inflation in many GWAS of large sample size.

Supplementary Note: https://images.nature.com/original/nature-assets/ng/journal/v47/n3/extref/ng.3211-S1.pdf

An atlas of genetic correlations across human diseases

and traits: https://sci-hub.bz/10.1038/ng.3406

https://www.biorxiv.org/content/early/2015/01/27/014498.full.pdf

Supplementary Note: https://images.nature.com/original/nature-assets/ng/journal/v47/n11/extref/ng.3406-S1.pdf

https://github.com/bulik/ldsc

ldsc is a command line tool for estimating heritability and genetic correlation from GWAS summary statistics. ldsc also computes LD Scores.

november 2017 by nhaliday

Fitting a Structural Equation Model

november 2017 by nhaliday

seems rather unrigorous: nonlinear optimization, possibility of nonconvergence, doesn't even mention local vs. global optimality...

pdf
slides
lectures
acm
stats
hypothesis-testing
graphs
graphical-models
latent-variables
model-class
optimization
nonlinearity
gotchas
nibble
ML-MAP-E
iteration-recursion
convergence
november 2017 by nhaliday

Analytic approaches to twin data using structural equation models

pdf study article explanation methodology variance-components biodet behavioral-gen twin-study genetics population-genetics models model-class graphs graphical-models latent-variables ML-MAP-E stats hypothesis-testing nibble 🌞 correlation bioinformatics acm GxE assortative-mating stat-power confidence

november 2017 by nhaliday

pdf study article explanation methodology variance-components biodet behavioral-gen twin-study genetics population-genetics models model-class graphs graphical-models latent-variables ML-MAP-E stats hypothesis-testing nibble 🌞 correlation bioinformatics acm GxE assortative-mating stat-power confidence

november 2017 by nhaliday

Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data

november 2017 by nhaliday

- Pickrell, Pritchard

treemix

nibble
study
article
methodology
bio
sapiens
genetics
genomics
gene-flow
trees
bioinformatics
hypothesis-testing
🌞
population-genetics
software
concept
levers
ideas
libraries
tools
pop-structure
treemix

november 2017 by nhaliday

Ancient Admixture in Human History

november 2017 by nhaliday

- Patterson, Reich et al., 2012

Population mixture is an important process in biology. We present a suite of methods for learning about population mixtures, implemented in a software package called ADMIXTOOLS, that support formal tests for whether mixture occurred and make it possible to infer proportions and dates of mixture. We also describe the development of a new single nucleotide polymorphism (SNP) array consisting of 629,433 sites with clearly documented ascertainment that was specifically designed for population genetic analyses and that we genotyped in 934 individuals from 53 diverse populations. To illustrate the methods, we give a number of examples that provide new insights about the history of human admixture. The most striking finding is a clear signal of admixture into northern Europe, with one ancestral population related to present-day Basques and Sardinians and the other related to present-day populations of northeast Asia and the Americas. This likely reflects a history of admixture between Neolithic migrants and the indigenous Mesolithic population of Europe, consistent with recent analyses of ancient bones from Sweden and the sequencing of the genome of the Tyrolean “Iceman.”

nibble
pdf
study
article
methodology
bio
sapiens
genetics
genomics
population-genetics
migration
gene-flow
software
trees
concept
history
antiquity
europe
roots
gavisti
🌞
bioinformatics
metrics
hypothesis-testing
levers
ideas
libraries
tools
pop-structure
Population mixture is an important process in biology. We present a suite of methods for learning about population mixtures, implemented in a software package called ADMIXTOOLS, that support formal tests for whether mixture occurred and make it possible to infer proportions and dates of mixture. We also describe the development of a new single nucleotide polymorphism (SNP) array consisting of 629,433 sites with clearly documented ascertainment that was specifically designed for population genetic analyses and that we genotyped in 934 individuals from 53 diverse populations. To illustrate the methods, we give a number of examples that provide new insights about the history of human admixture. The most striking finding is a clear signal of admixture into northern Europe, with one ancestral population related to present-day Basques and Sardinians and the other related to present-day populations of northeast Asia and the Americas. This likely reflects a history of admixture between Neolithic migrants and the indigenous Mesolithic population of Europe, consistent with recent analyses of ancient bones from Sweden and the sequencing of the genome of the Tyrolean “Iceman.”

november 2017 by nhaliday

references - Mathematician wants the equivalent knowledge to a quality stats degree - Cross Validated

nibble q-n-a overflow lens acm stats hypothesis-testing limits confluence books recommendations list top-n accretion data-science roadmap p:whenever p:someday reading quixotic advanced markov monte-carlo convexity-curvature optimization topics linear-models linear-algebra machine-learning classification random rand-approx martingale regression time-series no-go

november 2017 by nhaliday

nibble q-n-a overflow lens acm stats hypothesis-testing limits confluence books recommendations list top-n accretion data-science roadmap p:whenever p:someday reading quixotic advanced markov monte-carlo convexity-curvature optimization topics linear-models linear-algebra machine-learning classification random rand-approx martingale regression time-series no-go

november 2017 by nhaliday

Two-Sample Hypothesis Tests for Differences in ... - Data @ Quora - Quora

techtariat quora qra project data-science engineering methodology stats hypothesis-testing distribution expectancy limits concentration-of-measure probability orders acm comparison magnitude time-complexity performance parametric nonparametric org:com

november 2017 by nhaliday

techtariat quora qra project data-science engineering methodology stats hypothesis-testing distribution expectancy limits concentration-of-measure probability orders acm comparison magnitude time-complexity performance parametric nonparametric org:com

november 2017 by nhaliday

Karl Pearson and the Chi-squared Test

october 2017 by nhaliday

Pearson's paper of 1900 introduced what subsequently became known as the chi-squared test of goodness of fit. The terminology and allusions of 80 years ago create a barrier for the modern reader, who finds that the interpretation of Pearson's test procedure and the assessment of what he achieved are less than straightforward, notwithstanding the technical advances made since then. An attempt is made here to surmount these difficulties by exploring Pearson's relevant activities during the first decade of his statistical career, and by describing the work by his contemporaries and predecessors which seem to have influenced his approach to the problem. Not all the questions are answered, and others remain for further study.

original paper: http://www.economics.soton.ac.uk/staff/aldrich/1900.pdf

How did Karl Pearson come up with the chi-squared statistic?: https://stats.stackexchange.com/questions/97604/how-did-karl-pearson-come-up-with-the-chi-squared-statistic

He proceeds by working with the multivariate normal, and the chi-square arises as a sum of squared standardized normal variates.

You can see from the discussion on p160-161 he's clearly discussing applying the test to multinomial distributed data (I don't think he uses that term anywhere). He apparently understands the approximate multivariate normality of the multinomial (certainly he knows the margins are approximately normal - that's a very old result - and knows the means, variances and covariances, since they're stated in the paper); my guess is that most of that stuff is already old hat by 1900. (Note that the chi-squared distribution itself dates back to work by Helmert in the mid-1870s.)

Then by the bottom of p163 he derives a chi-square statistic as "a measure of goodness of fit" (the statistic itself appears in the exponent of the multivariate normal approximation).

He then goes on to discuss how to evaluate the p-value*, and then he correctly gives the upper tail area of a χ212χ122 beyond 43.87 as 0.000016. [You should keep in mind, however, that he didn't correctly understand how to adjust degrees of freedom for parameter estimation at that stage, so some of the examples in his papers use too high a d.f.]

nibble
papers
acm
stats
hypothesis-testing
methodology
history
mostly-modern
pre-ww2
old-anglo
giants
science
the-trenches
stories
multi
q-n-a
overflow
explanation
summary
innovation
discovery
distribution
degrees-of-freedom
limits
original paper: http://www.economics.soton.ac.uk/staff/aldrich/1900.pdf

How did Karl Pearson come up with the chi-squared statistic?: https://stats.stackexchange.com/questions/97604/how-did-karl-pearson-come-up-with-the-chi-squared-statistic

He proceeds by working with the multivariate normal, and the chi-square arises as a sum of squared standardized normal variates.

You can see from the discussion on p160-161 he's clearly discussing applying the test to multinomial distributed data (I don't think he uses that term anywhere). He apparently understands the approximate multivariate normality of the multinomial (certainly he knows the margins are approximately normal - that's a very old result - and knows the means, variances and covariances, since they're stated in the paper); my guess is that most of that stuff is already old hat by 1900. (Note that the chi-squared distribution itself dates back to work by Helmert in the mid-1870s.)

Then by the bottom of p163 he derives a chi-square statistic as "a measure of goodness of fit" (the statistic itself appears in the exponent of the multivariate normal approximation).

He then goes on to discuss how to evaluate the p-value*, and then he correctly gives the upper tail area of a χ212χ122 beyond 43.87 as 0.000016. [You should keep in mind, however, that he didn't correctly understand how to adjust degrees of freedom for parameter estimation at that stage, so some of the examples in his papers use too high a d.f.]

october 2017 by nhaliday

Section 10 Chi-squared goodness-of-fit test.

october 2017 by nhaliday

- pf that chi-squared statistic for Pearson's test (multinomial goodness-of-fit) actually has chi-squared distribution asymptotically

- the gotcha: terms Z_j in sum aren't independent

- solution:

- compute the covariance matrix of the terms to be E[Z_iZ_j] = -sqrt(p_ip_j)

- note that an equivalent way of sampling the Z_j is to take a random standard Gaussian and project onto the plane orthogonal to (sqrt(p_1), sqrt(p_2), ..., sqrt(p_r))

- that is equivalent to just sampling a Gaussian w/ 1 less dimension (hence df=r-1)

QED

pdf
nibble
lecture-notes
mit
stats
hypothesis-testing
acm
probability
methodology
proofs
iidness
distribution
limits
identity
direction
lifts-projections
- the gotcha: terms Z_j in sum aren't independent

- solution:

- compute the covariance matrix of the terms to be E[Z_iZ_j] = -sqrt(p_ip_j)

- note that an equivalent way of sampling the Z_j is to take a random standard Gaussian and project onto the plane orthogonal to (sqrt(p_1), sqrt(p_2), ..., sqrt(p_r))

- that is equivalent to just sampling a Gaussian w/ 1 less dimension (hence df=r-1)

QED

october 2017 by nhaliday

Timofey Pnin on Twitter: "I like this example of moderator analysis from Hunter & Schmidt's meta-analysis book. 30 small studies of corrs b/w employees' job satisfact… https://t.co/rgoqP6HzPQ"

october 2017 by nhaliday

I think I follow pretty smart people but I see these small sample studies on my timeline all the time. Remember people, the law of large numbers is a true theorem but the law of small numbers is a joke by Tversky & Kahneman:

gnon
unaffiliated
twitter
social
discussion
thinking
metabuch
science
meta:science
realness
signal-noise
magnitude
scale
measurement
evidence-based
stat-power
hypothesis-testing
methodology
stats
data-science
critique
counterexample
meta-analysis
books
recommendations
confidence
october 2017 by nhaliday

Death of a Statesmen: The Effect of Leadership Visits on Exports

september 2017 by nhaliday

also serves as a good overview of issues in identification strategies+IV method and why natural experiments are so useful

Paying a Visit: The Dalai Lama Effect on International Trade: http://www.econ.cam.ac.uk/research-files/repec/cam/pdf/cwpe1103.pdf

pdf
study
economics
growth-econ
econometrics
natural-experiment
endo-exo
political-econ
politics
polisci
government
trade
nationalism-globalism
leadership
death
stylized-facts
microfoundations
world
history
mostly-modern
counterfactual
statesmen
methodology
explanation
intricacy
causation
🎩
summary
intervention
dropbox
multi
china
asia
foreign-policy
realpolitik
sinosphere
authoritarianism
antidemos
hypothesis-testing
organizing
endogenous-exogenous
preprint
cost-benefit
branches
Paying a Visit: The Dalai Lama Effect on International Trade: http://www.econ.cam.ac.uk/research-files/repec/cam/pdf/cwpe1103.pdf

september 2017 by nhaliday

Information Processing: Estimation of genetic architecture for complex traits using GWAS data

hsu scitariat commentary study summary bio preprint biodet behavioral-gen genetics genomics QTL scaling-up speedometer survey state-of-art iq education GWAS scale data visualization measurement 🌞 bioinformatics missing-heritability chart nibble population-genetics candidate-gene methodology stat-power bounded-cognition lens hypothesis-testing ioannidis stats meta:science

august 2017 by nhaliday

hsu scitariat commentary study summary bio preprint biodet behavioral-gen genetics genomics QTL scaling-up speedometer survey state-of-art iq education GWAS scale data visualization measurement 🌞 bioinformatics missing-heritability chart nibble population-genetics candidate-gene methodology stat-power bounded-cognition lens hypothesis-testing ioannidis stats meta:science

august 2017 by nhaliday

Immigrants and Everest, Bryan Caplan | EconLog | Library of Economics and Liberty

august 2017 by nhaliday

Immigrants use less welfare than natives, holding income constant. Immigrants are far less likely to be in jail than natives, holding high school graduation constant.* On the surface, these seem like striking results. But I've heard a couple of smart people [Garett Jones] demur with an old statistics joke: "Controlling for barometric pressure, Mount Everest has the same altitude as the Dead Sea." Sometimes controls conceal the truth rather than laying it bare.

https://twitter.com/GarettJones/status/897153018503852033

https://archive.is/9k2Ww

org:econlib
econotariat
cracker-econ
garett-jones
migration
meta:rhetoric
propaganda
crime
criminology
causation
endo-exo
regression
spearhead
aphorism
hypothesis-testing
twitter
social
discussion
pic
quotes
gotchas
multi
backup
endogenous-exogenous
https://twitter.com/GarettJones/status/897153018503852033

https://archive.is/9k2Ww

august 2017 by nhaliday

DAGs, Horserace Regressions, and Paradigm Wars

scitariat social-science data-science causation endo-exo regression methodology graphs intricacy polisci foreign-policy empirical multi reddit social commentary ssc gwern hypothesis-testing poast garett-jones endogenous-exogenous

august 2017 by nhaliday

scitariat social-science data-science causation endo-exo regression methodology graphs intricacy polisci foreign-policy empirical multi reddit social commentary ssc gwern hypothesis-testing poast garett-jones endogenous-exogenous

august 2017 by nhaliday

Analysis of variance - Wikipedia

july 2017 by nhaliday

Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as "variation" among and between groups), developed by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. ANOVAs are useful for comparing (testing) three or more means (groups or variables) for statistical significance. It is conceptually similar to multiple two-sample t-tests, but is more conservative (results in less type I error) and is therefore suited to a wide range of practical problems.

good pic: https://en.wikipedia.org/wiki/Analysis_of_variance#Motivating_example

tutorial by Gelman: http://www.stat.columbia.edu/~gelman/research/published/econanova3.pdf

so one way to think of partitioning the variance:

y_ij = alpha_i + beta_j + eps_ij

Var(y_ij) = Var(alpha_i) + Var(beta_j) + Cov(alpha_i, beta_j) + Var(eps_ij)

and alpha_i, beta_j are independent, so Cov(alpha_i, beta_j) = 0

can you make this work w/ interaction effects?

data-science
stats
methodology
hypothesis-testing
variance-components
concept
conceptual-vocab
thinking
wiki
reference
nibble
multi
visualization
visual-understanding
pic
pdf
exposition
lecture-notes
gelman
scitariat
tutorial
acm
ground-up
yoga
good pic: https://en.wikipedia.org/wiki/Analysis_of_variance#Motivating_example

tutorial by Gelman: http://www.stat.columbia.edu/~gelman/research/published/econanova3.pdf

so one way to think of partitioning the variance:

y_ij = alpha_i + beta_j + eps_ij

Var(y_ij) = Var(alpha_i) + Var(beta_j) + Cov(alpha_i, beta_j) + Var(eps_ij)

and alpha_i, beta_j are independent, so Cov(alpha_i, beta_j) = 0

can you make this work w/ interaction effects?

july 2017 by nhaliday

Polygenic transmission disequilibrium confirms that common and rare variation act additively to create risk for autism spectrum disorders : Nature Genetics : Nature Research

july 2017 by nhaliday

Autism spectrum disorder (ASD) risk is influenced by common polygenic and de novo variation. We aimed to clarify the influence of polygenic risk for ASD and to identify subgroups of ASD cases, including those with strongly acting de novo variants, in which polygenic risk is relevant. Using a novel approach called the polygenic transmission disequilibrium test and data from 6,454 families with a child with ASD, we show that polygenic risk for ASD, schizophrenia, and greater educational attainment is over-transmitted to children with ASD. These findings hold independent of proband IQ. We find that polygenic variation contributes additively to risk in ASD cases who carry a strongly acting de novo variant. Lastly, we show that elements of polygenic risk are independent and differ in their relationship with phenotype. These results confirm that the genetic influences on ASD are additive and suggest that they create risk through at least partially distinct etiologic pathways.

https://en.wikipedia.org/wiki/Transmission_disequilibrium_test

study
biodet
behavioral-gen
genetics
population-genetics
QTL
missing-heritability
psychiatry
autism
👽
disease
org:nat
🌞
gwern
pdf
piracy
education
multi
methodology
wiki
reference
psychology
cog-psych
genetic-load
genetic-correlation
sib-study
hypothesis-testing
equilibrium
iq
correlation
intricacy
GWAS
causation
endo-exo
endogenous-exogenous
https://en.wikipedia.org/wiki/Transmission_disequilibrium_test

july 2017 by nhaliday

Econometric Modeling as Junk Science

june 2017 by nhaliday

The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics: https://www.aeaweb.org/articles?id=10.1257/jep.24.2.3

On data, experiments, incentives and highly unconvincing research – papers and hot beverages: https://papersandhotbeverages.wordpress.com/2015/10/31/on-data-experiments-incentives-and-highly-unconvincing-research/

In my view, it has just to do with the fact that academia is a peer monitored organization. In the case of (bad) data collection papers, issues related to measurement are typically boring. They are relegated to appendices, no one really has an incentive to monitor it seriously. The problem is similar in formal theory: no one really goes through the algebra in detail, but it is in principle feasible to do it, and, actually, sometimes these errors are detected. If discussing the algebra of a proof is almost unthinkable in a seminar, going into the details of data collection, measurement and aggregation is not only hard to imagine, but probably intrinsically infeasible.

Something different happens for the experimentalist people. As I was saying, I feel we have come to a point in which many papers are evaluated based on the cleverness and originality of the research design (“Using the World Cup qualifiers as an instrument for patriotism!? Woaw! how cool/crazy is that! I wish I had had that idea”). The sexiness of the identification strategy has too often become a goal in itself. When your peers monitor you paying more attention to the originality of the identification strategy than to the research question, you probably have an incentive to mine reality for ever crazier discontinuities. It is true methodologists have been criticized in the past for analogous reasons, such as being guided by the desire to increase mathematical complexity without a clear benefit. But, if you work with pure formal theory or statistical theory, your work is not meant to immediately answer question about the real world, but instead to serve other researchers in their quest. This is something that can, in general, not be said of applied CI work.

https://twitter.com/pseudoerasmus/status/662007951415238656

This post should have been entitled “Zombies who only think of their next cool IV fix”

https://twitter.com/pseudoerasmus/status/662692917069422592

massive lust for quasi-natural experiments, regression discontinuities

barely matters if the effects are not all that big

I suppose even the best of things must reach their decadent phase; methodological innov. to manias……

https://twitter.com/cblatts/status/920988530788130816

Following this "collapse of small-N social psych results" business, where do I predict econ will collapse? I see two main contenders.

One is lab studies. I dallied with these a few years ago in a Kenya lab. We ran several pilots of N=200 to figure out the best way to treat

and to measure the outcome. Every pilot gave us a different stat sig result. I could have written six papers concluding different things.

I gave up more skeptical of these lab studies than ever before. The second contender is the long run impacts literature in economic history

We should be very suspicious since we never see a paper showing that a historical event had no effect on modern day institutions or dvpt.

On the one hand I find these studies fun, fascinating, and probably true in a broad sense. They usually reinforce a widely believed history

argument with interesting data and a cute empirical strategy. But I don't think anyone believes the standard errors. There's probably a HUGE

problem of nonsignificant results staying in the file drawer. Also, there are probably data problems that don't get revealed, as we see with

the recent Piketty paper (http://marginalrevolution.com/marginalrevolution/2017/10/pikettys-data-reliable.html). So I take that literature with a vat of salt, even if I enjoy and admire the works

I used to think field experiments would show little consistency in results across place. That external validity concerns would be fatal.

In fact the results across different samples and places have proven surprisingly similar across places, and added a lot to general theory

Last, I've come to believe there is no such thing as a useful instrumental variable. The ones that actually meet the exclusion restriction

are so weird & particular that the local treatment effect is likely far different from the average treatment effect in non-transparent ways.

Most of the other IVs don't plausibly meet the e clue ion restriction. I mean, we should be concerned when the IV estimate is always 10x

larger than the OLS coefficient. This I find myself much more persuaded by simple natural experiments that use OLS, diff in diff, or

discontinuities, alongside randomized trials.

What do others think are the cliffs in economics?

PS All of these apply to political science too. Though I have a special extra target in poli sci: survey experiments! A few are good. I like

Dan Corstange's work. But it feels like 60% of dissertations these days are experiments buried in a survey instrument that measure small

changes in response. These at least have large N. But these are just uncontrolled labs, with negligible external validity in my mind.

The good ones are good. This method has its uses. But it's being way over-applied. More people have to make big and risky investments in big

natural and field experiments. Time to raise expectations and ambitions. This expectation bar, not technical ability, is the big advantage

economists have over political scientists when they compete in the same space.

(Ok. So are there any friends and colleagues I haven't insulted this morning? Let me know and I'll try my best to fix it with a screed)

HOW MUCH SHOULD WE TRUST DIFFERENCES-IN-DIFFERENCES ESTIMATES?∗: https://economics.mit.edu/files/750

Most papers that employ Differences-in-Differences estimation (DD) use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent. To illustrate the severity of this issue, we randomly generate placebo laws in state-level data on female wages from the Current Population Survey. For each law, we use OLS to compute the DD estimate of its “effect” as well as the standard error of this estimate. These conventional DD standard errors severely understate the standard deviation of the estimators: we find an “effect” significant at the 5 percent level for up to 45 percent of the placebo interventions. We use Monte Carlo simulations to investigate how well existing methods help solve this problem. Econometric corrections that place a specific parametric form on the time-series process do not perform well. Bootstrap (taking into account the auto-correlation of the data) works well when the number of states is large enough. Two corrections based on asymptotic approximation of the variance-covariance matrix work well for moderate numbers of states and one correction that collapses the time series information into a “pre” and “post” period and explicitly takes into account the effective sample size works well even for small numbers of states.

‘METRICS MONDAY: 2SLS–CHRONICLE OF A DEATH FORETOLD: http://marcfbellemare.com/wordpress/12733

As it turns out, Young finds that

1. Conventional tests tend to overreject the null hypothesis that the 2SLS coefficient is equal to zero.

2. 2SLS estimates are falsely declared significant one third to one half of the time, depending on the method used for bootstrapping.

3. The 99-percent confidence intervals (CIs) of those 2SLS estimates include the OLS point estimate over 90 of the time. They include the full OLS 99-percent CI over 75 percent of the time.

4. 2SLS estimates are extremely sensitive to outliers. Removing simply one outlying cluster or observation, almost half of 2SLS results become insignificant. Things get worse when removing two outlying clusters or observations, as over 60 percent of 2SLS results then become insignificant.

5. Using a Durbin-Wu-Hausman test, less than 15 percent of regressions can reject the null that OLS estimates are unbiased at the 1-percent level.

6. 2SLS has considerably higher mean squared error than OLS.

7. In one third to one half of published results, the null that the IVs are totally irrelevant cannot be rejected, and so the correlation between the endogenous variable(s) and the IVs is due to finite sample correlation between them.

8. Finally, fewer than 10 percent of 2SLS estimates reject instrument irrelevance and the absence of OLS bias at the 1-percent level using a Durbin-Wu-Hausman test. It gets much worse–fewer than 5 percent–if you add in the requirement that the 2SLS CI that excludes the OLS estimate.

Methods Matter: P-Hacking and Causal Inference in Economics*: http://ftp.iza.org/dp11796.pdf

Applying multiple methods to 13,440 hypothesis tests reported in 25 top economics journals in 2015, we show that selective publication and p-hacking is a substantial problem in research employing DID and (in particular) IV. RCT and RDD are much less problematic. Almost 25% of claims of marginally significant results in IV papers are misleading.

https://twitter.com/NoamJStein/status/1040887307568664577

Ever since I learned social science is completely fake, I've had a lot more time to do stuff that matters, like deadlifting and reading about Mediterranean haplogroups

--

Wait, so, from fakest to realest IV>DD>RCT>RDD? That totally matches my impression.

https://twitter.com/wwwojtekk/status/1190731344336293889

https://archive.is/EZu0h

Great (not completely new but still good to have it in one place) discussion of RCTs and inference in economics by Deaton, my favorite sentences (more general than just about RCT) below

Randomization in the tropics revisited: a theme and eleven variations: https://scholar.princeton.edu/sites/default/files/deaton/files/deaton_randomization_revisited_v3_2019.pdf

org:junk
org:edu
economics
econometrics
methodology
realness
truth
science
social-science
accuracy
generalization
essay
article
hmm
multi
study
🎩
empirical
causation
error
critique
sociology
criminology
hypothesis-testing
econotariat
broad-econ
cliometrics
endo-exo
replication
incentives
academia
measurement
wire-guided
intricacy
twitter
social
discussion
pseudoE
effect-size
reflection
field-study
stat-power
piketty
marginal-rev
commentary
data-science
expert-experience
regression
gotchas
rant
map-territory
pdf
simulation
moments
confidence
bias-variance
stats
endogenous-exogenous
control
meta:science
meta-analysis
outliers
summary
sampling
ensembles
monte-carlo
theory-practice
applicability-prereqs
chart
comparison
shift
ratty
unaffiliated
garett-jones
On data, experiments, incentives and highly unconvincing research – papers and hot beverages: https://papersandhotbeverages.wordpress.com/2015/10/31/on-data-experiments-incentives-and-highly-unconvincing-research/

In my view, it has just to do with the fact that academia is a peer monitored organization. In the case of (bad) data collection papers, issues related to measurement are typically boring. They are relegated to appendices, no one really has an incentive to monitor it seriously. The problem is similar in formal theory: no one really goes through the algebra in detail, but it is in principle feasible to do it, and, actually, sometimes these errors are detected. If discussing the algebra of a proof is almost unthinkable in a seminar, going into the details of data collection, measurement and aggregation is not only hard to imagine, but probably intrinsically infeasible.

Something different happens for the experimentalist people. As I was saying, I feel we have come to a point in which many papers are evaluated based on the cleverness and originality of the research design (“Using the World Cup qualifiers as an instrument for patriotism!? Woaw! how cool/crazy is that! I wish I had had that idea”). The sexiness of the identification strategy has too often become a goal in itself. When your peers monitor you paying more attention to the originality of the identification strategy than to the research question, you probably have an incentive to mine reality for ever crazier discontinuities. It is true methodologists have been criticized in the past for analogous reasons, such as being guided by the desire to increase mathematical complexity without a clear benefit. But, if you work with pure formal theory or statistical theory, your work is not meant to immediately answer question about the real world, but instead to serve other researchers in their quest. This is something that can, in general, not be said of applied CI work.

https://twitter.com/pseudoerasmus/status/662007951415238656

This post should have been entitled “Zombies who only think of their next cool IV fix”

https://twitter.com/pseudoerasmus/status/662692917069422592

massive lust for quasi-natural experiments, regression discontinuities

barely matters if the effects are not all that big

I suppose even the best of things must reach their decadent phase; methodological innov. to manias……

https://twitter.com/cblatts/status/920988530788130816

Following this "collapse of small-N social psych results" business, where do I predict econ will collapse? I see two main contenders.

One is lab studies. I dallied with these a few years ago in a Kenya lab. We ran several pilots of N=200 to figure out the best way to treat

and to measure the outcome. Every pilot gave us a different stat sig result. I could have written six papers concluding different things.

I gave up more skeptical of these lab studies than ever before. The second contender is the long run impacts literature in economic history

We should be very suspicious since we never see a paper showing that a historical event had no effect on modern day institutions or dvpt.

On the one hand I find these studies fun, fascinating, and probably true in a broad sense. They usually reinforce a widely believed history

argument with interesting data and a cute empirical strategy. But I don't think anyone believes the standard errors. There's probably a HUGE

problem of nonsignificant results staying in the file drawer. Also, there are probably data problems that don't get revealed, as we see with

the recent Piketty paper (http://marginalrevolution.com/marginalrevolution/2017/10/pikettys-data-reliable.html). So I take that literature with a vat of salt, even if I enjoy and admire the works

I used to think field experiments would show little consistency in results across place. That external validity concerns would be fatal.

In fact the results across different samples and places have proven surprisingly similar across places, and added a lot to general theory

Last, I've come to believe there is no such thing as a useful instrumental variable. The ones that actually meet the exclusion restriction

are so weird & particular that the local treatment effect is likely far different from the average treatment effect in non-transparent ways.

Most of the other IVs don't plausibly meet the e clue ion restriction. I mean, we should be concerned when the IV estimate is always 10x

larger than the OLS coefficient. This I find myself much more persuaded by simple natural experiments that use OLS, diff in diff, or

discontinuities, alongside randomized trials.

What do others think are the cliffs in economics?

PS All of these apply to political science too. Though I have a special extra target in poli sci: survey experiments! A few are good. I like

Dan Corstange's work. But it feels like 60% of dissertations these days are experiments buried in a survey instrument that measure small

changes in response. These at least have large N. But these are just uncontrolled labs, with negligible external validity in my mind.

The good ones are good. This method has its uses. But it's being way over-applied. More people have to make big and risky investments in big

natural and field experiments. Time to raise expectations and ambitions. This expectation bar, not technical ability, is the big advantage

economists have over political scientists when they compete in the same space.

(Ok. So are there any friends and colleagues I haven't insulted this morning? Let me know and I'll try my best to fix it with a screed)

HOW MUCH SHOULD WE TRUST DIFFERENCES-IN-DIFFERENCES ESTIMATES?∗: https://economics.mit.edu/files/750

Most papers that employ Differences-in-Differences estimation (DD) use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent. To illustrate the severity of this issue, we randomly generate placebo laws in state-level data on female wages from the Current Population Survey. For each law, we use OLS to compute the DD estimate of its “effect” as well as the standard error of this estimate. These conventional DD standard errors severely understate the standard deviation of the estimators: we find an “effect” significant at the 5 percent level for up to 45 percent of the placebo interventions. We use Monte Carlo simulations to investigate how well existing methods help solve this problem. Econometric corrections that place a specific parametric form on the time-series process do not perform well. Bootstrap (taking into account the auto-correlation of the data) works well when the number of states is large enough. Two corrections based on asymptotic approximation of the variance-covariance matrix work well for moderate numbers of states and one correction that collapses the time series information into a “pre” and “post” period and explicitly takes into account the effective sample size works well even for small numbers of states.

‘METRICS MONDAY: 2SLS–CHRONICLE OF A DEATH FORETOLD: http://marcfbellemare.com/wordpress/12733

As it turns out, Young finds that

1. Conventional tests tend to overreject the null hypothesis that the 2SLS coefficient is equal to zero.

2. 2SLS estimates are falsely declared significant one third to one half of the time, depending on the method used for bootstrapping.

3. The 99-percent confidence intervals (CIs) of those 2SLS estimates include the OLS point estimate over 90 of the time. They include the full OLS 99-percent CI over 75 percent of the time.

4. 2SLS estimates are extremely sensitive to outliers. Removing simply one outlying cluster or observation, almost half of 2SLS results become insignificant. Things get worse when removing two outlying clusters or observations, as over 60 percent of 2SLS results then become insignificant.

5. Using a Durbin-Wu-Hausman test, less than 15 percent of regressions can reject the null that OLS estimates are unbiased at the 1-percent level.

6. 2SLS has considerably higher mean squared error than OLS.

7. In one third to one half of published results, the null that the IVs are totally irrelevant cannot be rejected, and so the correlation between the endogenous variable(s) and the IVs is due to finite sample correlation between them.

8. Finally, fewer than 10 percent of 2SLS estimates reject instrument irrelevance and the absence of OLS bias at the 1-percent level using a Durbin-Wu-Hausman test. It gets much worse–fewer than 5 percent–if you add in the requirement that the 2SLS CI that excludes the OLS estimate.

Methods Matter: P-Hacking and Causal Inference in Economics*: http://ftp.iza.org/dp11796.pdf

Applying multiple methods to 13,440 hypothesis tests reported in 25 top economics journals in 2015, we show that selective publication and p-hacking is a substantial problem in research employing DID and (in particular) IV. RCT and RDD are much less problematic. Almost 25% of claims of marginally significant results in IV papers are misleading.

https://twitter.com/NoamJStein/status/1040887307568664577

Ever since I learned social science is completely fake, I've had a lot more time to do stuff that matters, like deadlifting and reading about Mediterranean haplogroups

--

Wait, so, from fakest to realest IV>DD>RCT>RDD? That totally matches my impression.

https://twitter.com/wwwojtekk/status/1190731344336293889

https://archive.is/EZu0h

Great (not completely new but still good to have it in one place) discussion of RCTs and inference in economics by Deaton, my favorite sentences (more general than just about RCT) below

Randomization in the tropics revisited: a theme and eleven variations: https://scholar.princeton.edu/sites/default/files/deaton/files/deaton_randomization_revisited_v3_2019.pdf

june 2017 by nhaliday

Power failure: why small sample size undermines the reliability of neuroscience : Article : Nature Reviews Neuroscience

june 2017 by nhaliday

winner's curse etc.

http://blogs.discovermagazine.com/neuroskeptic/2017/07/19/neuroscience-underpowered/

study
org:nat
ioannidis
science
meta:science
neuro
methodology
error
replication
stat-power
hypothesis-testing
info-dynamics
meta-analysis
critique
bounded-cognition
🔬
multi
org:sci
brain-scan
model-organism
ethics
distribution
http://blogs.discovermagazine.com/neuroskeptic/2017/07/19/neuroscience-underpowered/

june 2017 by nhaliday

Why we should love null results – The 100% CI

june 2017 by nhaliday

https://twitter.com/StuartJRitchie/status/870257682233659392

This is a must-read blog for many reasons, but biggest is: it REALLY matters if a hypothesis is likely to be true.

Strikes me that the areas of psychology with the most absurd hypotheses (ones least likely to be true) *AHEMSOCIALPRIMINGAHEM* are also...

...the ones with extremely small sample sizes. So this already-scary graph from the blogpost becomes all the more terrifying:

scitariat
explanation
science
hypothesis-testing
methodology
null-result
multi
albion
twitter
social
commentary
psychology
social-psych
social-science
meta:science
data
visualization
nitty-gritty
stat-power
priors-posteriors
This is a must-read blog for many reasons, but biggest is: it REALLY matters if a hypothesis is likely to be true.

Strikes me that the areas of psychology with the most absurd hypotheses (ones least likely to be true) *AHEMSOCIALPRIMINGAHEM* are also...

...the ones with extremely small sample sizes. So this already-scary graph from the blogpost becomes all the more terrifying:

june 2017 by nhaliday

Pearson correlation coefficient - Wikipedia

may 2017 by nhaliday

https://en.wikipedia.org/wiki/Coefficient_of_determination

what does this mean?: https://twitter.com/GarettJones/status/863546692724858880

deleted but it was about the Pearson correlation distance: 1-r

I guess it's a metric

https://en.wikipedia.org/wiki/Explained_variation

http://infoproc.blogspot.com/2014/02/correlation-and-variance.html

A less misleading way to think about the correlation R is as follows: given X,Y from a standardized bivariate distribution with correlation R, an increase in X leads to an expected increase in Y: dY = R dX. In other words, students with +1 SD SAT score have, on average, roughly +0.4 SD college GPAs. Similarly, students with +1 SD college GPAs have on average +0.4 SAT.

this reminds me of the breeder's equation (but it uses r instead of h^2, so it can't actually be the same)

https://www.reddit.com/r/slatestarcodex/comments/631haf/on_the_commentariat_here_and_why_i_dont_think_i/dfx4e2s/

stats
science
hypothesis-testing
correlation
metrics
plots
regression
wiki
reference
nibble
methodology
multi
twitter
social
discussion
best-practices
econotariat
garett-jones
concept
conceptual-vocab
accuracy
causation
acm
matrix-factorization
todo
explanation
yoga
hsu
street-fighting
levers
🌞
2014
scitariat
variance-components
meta:prediction
biodet
s:**
mental-math
reddit
commentary
ssc
poast
gwern
data-science
metric-space
similarity
measure
dependence-independence
what does this mean?: https://twitter.com/GarettJones/status/863546692724858880

deleted but it was about the Pearson correlation distance: 1-r

I guess it's a metric

https://en.wikipedia.org/wiki/Explained_variation

http://infoproc.blogspot.com/2014/02/correlation-and-variance.html

A less misleading way to think about the correlation R is as follows: given X,Y from a standardized bivariate distribution with correlation R, an increase in X leads to an expected increase in Y: dY = R dX. In other words, students with +1 SD SAT score have, on average, roughly +0.4 SD college GPAs. Similarly, students with +1 SD college GPAs have on average +0.4 SAT.

this reminds me of the breeder's equation (but it uses r instead of h^2, so it can't actually be the same)

https://www.reddit.com/r/slatestarcodex/comments/631haf/on_the_commentariat_here_and_why_i_dont_think_i/dfx4e2s/

may 2017 by nhaliday

Proceeding From Observed Correlation to Causal Inference: The Use of Natural Experiments

pdf study article essay methodology hypothesis-testing causation endo-exo natural-experiment correlation confounding measurement social-science psychology social-psych econometrics explanation thinking endogenous-exogenous

may 2017 by nhaliday

pdf study article essay methodology hypothesis-testing causation endo-exo natural-experiment correlation confounding measurement social-science psychology social-psych econometrics explanation thinking endogenous-exogenous

may 2017 by nhaliday

'Capital in the Twenty-First Century' by Thomas Piketty, reviewed | New Republic

april 2017 by nhaliday

by Robert Solow (positive)

The data then exhibit a clear pattern. In France and Great Britain, national capital stood fairly steadily at about seven times national income from 1700 to 1910, then fell sharply from 1910 to 1950, presumably as a result of wars and depression, reaching a low of 2.5 in Britain and a bit less than 3 in France. The capital-income ratio then began to climb in both countries, and reached slightly more than 5 in Britain and slightly less than 6 in France by 2010. The trajectory in the United States was slightly different: it started at just above 3 in 1770, climbed to 5 in 1910, fell slightly in 1920, recovered to a high between 5 and 5.5 in 1930, fell to below 4 in 1950, and was back to 4.5 in 2010.

The wealth-income ratio in the United States has always been lower than in Europe. The main reason in the early years was that land values bulked less in the wide open spaces of North America. There was of course much more land, but it was very cheap. Into the twentieth century and onward, however, the lower capital-income ratio in the United States probably reflects the higher level of productivity: a given amount of capital could support a larger production of output than in Europe. It is no surprise that the two world wars caused much less destruction and dissipation of capital in the United States than in Britain and France. The important observation for Piketty’s argument is that, in all three countries, and elsewhere as well, the wealth-income ratio has been increasing since 1950, and is almost back to nineteenth-century levels. He projects this increase to continue into the current century, with weighty consequences that will be discussed as we go on.

...

Now if you multiply the rate of return on capital by the capital-income ratio, you get the share of capital in the national income. For example, if the rate of return is 5 percent a year and the stock of capital is six years worth of national income, income from capital will be 30 percent of national income, and so income from work will be the remaining 70 percent. At last, after all this preparation, we are beginning to talk about inequality, and in two distinct senses. First, we have arrived at the functional distribution of income—the split between income from work and income from wealth. Second, it is always the case that wealth is more highly concentrated among the rich than income from labor (although recent American history looks rather odd in this respect); and this being so, the larger the share of income from wealth, the more unequal the distribution of income among persons is likely to be. It is this inequality across persons that matters most for good or ill in a society.

...

The data are complicated and not easily comparable across time and space, but here is the flavor of Piketty’s summary picture. Capital is indeed very unequally distributed. Currently in the United States, the top 10 percent own about 70 percent of all the capital, half of that belonging to the top 1 percent; the next 40 percent—who compose the “middle class”—own about a quarter of the total (much of that in the form of housing), and the remaining half of the population owns next to nothing, about 5 percent of total wealth. Even that amount of middle-class property ownership is a new phenomenon in history. The typical European country is a little more egalitarian: the top 1 percent own 25 percent of the total capital, and the middle class 35 percent. (A century ago the European middle class owned essentially no wealth at all.) If the ownership of wealth in fact becomes even more concentrated during the rest of the twenty-first century, the outlook is pretty bleak unless you have a taste for oligarchy.

Income from wealth is probably even more concentrated than wealth itself because, as Piketty notes, large blocks of wealth tend to earn a higher return than small ones. Some of this advantage comes from economies of scale, but more may come from the fact that very big investors have access to a wider range of investment opportunities than smaller investors. Income from work is naturally less concentrated than income from wealth. In Piketty’s stylized picture of the United States today, the top 1 percent earns about 12 percent of all labor income, the next 9 percent earn 23 percent, the middle class gets about 40 percent, and the bottom half about a quarter of income from work. Europe is not very different: the top 10 percent collect somewhat less and the other two groups a little more.

You get the picture: modern capitalism is an unequal society, and the rich-get-richer dynamic strongly suggest that it will get more so. But there is one more loose end to tie up, already hinted at, and it has to do with the advent of very high wage incomes. First, here are some facts about the composition of top incomes. About 60 percent of the income of the top 1 percent in the United States today is labor income. Only when you get to the top tenth of 1 percent does income from capital start to predominate. The income of the top hundredth of 1 percent is 70 percent from capital. The story for France is not very different, though the proportion of labor income is a bit higher at every level. Evidently there are some very high wage incomes, as if you didn’t know.

This is a fairly recent development. In the 1960s, the top 1 percent of wage earners collected a little more than 5 percent of all wage incomes. This fraction has risen pretty steadily until nowadays, when the top 1 percent of wage earners receive 10–12 percent of all wages. This time the story is rather different in France. There the share of total wages going to the top percentile was steady at 6 percent until very recently, when it climbed to 7 percent. The recent surge of extreme inequality at the top of the wage distribution may be primarily an American development. Piketty, who with Emmanuel Saez has made a careful study of high-income tax returns in the United States, attributes this to the rise of what he calls “supermanagers.” The very highest income class consists to a substantial extent of top executives of large corporations, with very rich compensation packages. (A disproportionate number of these, but by no means all of them, come from the financial services industry.) With or without stock options, these large pay packages get converted to wealth and future income from wealth. But the fact remains that much of the increased income (and wealth) inequality in the United States is driven by the rise of these supermanagers.

and Deirdre McCloskey (p critical): https://ejpe.org/journal/article/view/170

nice discussion of empirical economics, economic history, market failures and statism, etc., with several bon mots

Piketty’s great splash will undoubtedly bring many young economically interested scholars to devote their lives to the study of the past. That is good, because economic history is one of the few scientifically quantitative branches of economics. In economic history, as in experimental economics and a few other fields, the economists confront the evidence (as they do not for example in most macroeconomics or industrial organization or international trade theory nowadays).

...

Piketty gives a fine example of how to do it. He does not get entangled as so many economists do in the sole empirical tool they are taught, namely, regression analysis on someone else’s “data” (one of the problems is the word data, meaning “things given”: scientists should deal in capta, “things seized”). Therefore he does not commit one of the two sins of modern economics, the use of meaningless “tests” of statistical significance (he occasionally refers to “statistically insignificant” relations between, say, tax rates and growth rates, but I am hoping he does not suppose that a large coefficient is “insignificant” because R. A. Fisher in 1925 said it was). Piketty constructs or uses statistics of aggregate capital and of inequality and then plots them out for inspection, which is what physicists, for example, also do in dealing with their experiments and observations. Nor does he commit the other sin, which is to waste scientific time on existence theorems. Physicists, again, don’t. If we economists are going to persist in physics envy let us at least learn what physicists actually do. Piketty stays close to the facts, and does not, for example, wander into the pointless worlds of non-cooperative game theory, long demolished by experimental economics. He also does not have recourse to non-computable general equilibrium, which never was of use for quantitative economic science, being a branch of philosophy, and a futile one at that. On both points, bravissimo.

...

Since those founding geniuses of classical economics, a market-tested betterment (a locution to be preferred to “capitalism”, with its erroneous implication that capital accumulation, not innovation, is what made us better off) has enormously enriched large parts of a humanity now seven times larger in population than in 1800, and bids fair in the next fifty years or so to enrich everyone on the planet. [Not SSA or MENA...]

...

Then economists, many on the left but some on the right, in quick succession from 1880 to the present—at the same time that market-tested betterment was driving real wages up and up and up—commenced worrying about, to name a few of the pessimisms concerning “capitalism” they discerned: greed, alienation, racial impurity, workers’ lack of bargaining strength, workers’ bad taste in consumption, immigration of lesser breeds, monopoly, unemployment, business cycles, increasing returns, externalities, under-consumption, monopolistic competition, separation of ownership from control, lack of planning, post-War stagnation, investment spillovers, unbalanced growth, dual labor markets, capital insufficiency (William Easterly calls it “capital fundamentalism”), peasant irrationality, capital-market imperfections, public … [more]

news
org:mag
big-peeps
econotariat
economics
books
review
capital
capitalism
inequality
winner-take-all
piketty
wealth
class
labor
mobility
redistribution
growth-econ
rent-seeking
history
mostly-modern
trends
compensation
article
malaise
🎩
the-bones
whiggish-hegelian
cjones-like
multi
mokyr-allen-mccloskey
expert
market-failure
government
broad-econ
cliometrics
aphorism
lens
gallic
clarity
europe
critique
rant
optimism
regularizer
pessimism
ideology
behavioral-econ
authoritarianism
intervention
polanyi-marx
politics
left-wing
absolute-relative
regression-to-mean
legacy
empirical
data-science
econometrics
methodology
hypothesis-testing
physics
iron-age
mediterranean
the-classics
quotes
krugman
world
entrepreneurialism
human-capital
education
supply-demand
plots
manifolds
intersection
markets
evolution
darwinian
giants
old-anglo
egalitarianism-hierarchy
optimate
morality
ethics
envy
stagnation
nl-and-so-can-you
expert-experience
courage
stats
randy-ayndy
reason
intersection-connectedness
detail-architect
The data then exhibit a clear pattern. In France and Great Britain, national capital stood fairly steadily at about seven times national income from 1700 to 1910, then fell sharply from 1910 to 1950, presumably as a result of wars and depression, reaching a low of 2.5 in Britain and a bit less than 3 in France. The capital-income ratio then began to climb in both countries, and reached slightly more than 5 in Britain and slightly less than 6 in France by 2010. The trajectory in the United States was slightly different: it started at just above 3 in 1770, climbed to 5 in 1910, fell slightly in 1920, recovered to a high between 5 and 5.5 in 1930, fell to below 4 in 1950, and was back to 4.5 in 2010.

The wealth-income ratio in the United States has always been lower than in Europe. The main reason in the early years was that land values bulked less in the wide open spaces of North America. There was of course much more land, but it was very cheap. Into the twentieth century and onward, however, the lower capital-income ratio in the United States probably reflects the higher level of productivity: a given amount of capital could support a larger production of output than in Europe. It is no surprise that the two world wars caused much less destruction and dissipation of capital in the United States than in Britain and France. The important observation for Piketty’s argument is that, in all three countries, and elsewhere as well, the wealth-income ratio has been increasing since 1950, and is almost back to nineteenth-century levels. He projects this increase to continue into the current century, with weighty consequences that will be discussed as we go on.

...

Now if you multiply the rate of return on capital by the capital-income ratio, you get the share of capital in the national income. For example, if the rate of return is 5 percent a year and the stock of capital is six years worth of national income, income from capital will be 30 percent of national income, and so income from work will be the remaining 70 percent. At last, after all this preparation, we are beginning to talk about inequality, and in two distinct senses. First, we have arrived at the functional distribution of income—the split between income from work and income from wealth. Second, it is always the case that wealth is more highly concentrated among the rich than income from labor (although recent American history looks rather odd in this respect); and this being so, the larger the share of income from wealth, the more unequal the distribution of income among persons is likely to be. It is this inequality across persons that matters most for good or ill in a society.

...

The data are complicated and not easily comparable across time and space, but here is the flavor of Piketty’s summary picture. Capital is indeed very unequally distributed. Currently in the United States, the top 10 percent own about 70 percent of all the capital, half of that belonging to the top 1 percent; the next 40 percent—who compose the “middle class”—own about a quarter of the total (much of that in the form of housing), and the remaining half of the population owns next to nothing, about 5 percent of total wealth. Even that amount of middle-class property ownership is a new phenomenon in history. The typical European country is a little more egalitarian: the top 1 percent own 25 percent of the total capital, and the middle class 35 percent. (A century ago the European middle class owned essentially no wealth at all.) If the ownership of wealth in fact becomes even more concentrated during the rest of the twenty-first century, the outlook is pretty bleak unless you have a taste for oligarchy.

Income from wealth is probably even more concentrated than wealth itself because, as Piketty notes, large blocks of wealth tend to earn a higher return than small ones. Some of this advantage comes from economies of scale, but more may come from the fact that very big investors have access to a wider range of investment opportunities than smaller investors. Income from work is naturally less concentrated than income from wealth. In Piketty’s stylized picture of the United States today, the top 1 percent earns about 12 percent of all labor income, the next 9 percent earn 23 percent, the middle class gets about 40 percent, and the bottom half about a quarter of income from work. Europe is not very different: the top 10 percent collect somewhat less and the other two groups a little more.

You get the picture: modern capitalism is an unequal society, and the rich-get-richer dynamic strongly suggest that it will get more so. But there is one more loose end to tie up, already hinted at, and it has to do with the advent of very high wage incomes. First, here are some facts about the composition of top incomes. About 60 percent of the income of the top 1 percent in the United States today is labor income. Only when you get to the top tenth of 1 percent does income from capital start to predominate. The income of the top hundredth of 1 percent is 70 percent from capital. The story for France is not very different, though the proportion of labor income is a bit higher at every level. Evidently there are some very high wage incomes, as if you didn’t know.

This is a fairly recent development. In the 1960s, the top 1 percent of wage earners collected a little more than 5 percent of all wage incomes. This fraction has risen pretty steadily until nowadays, when the top 1 percent of wage earners receive 10–12 percent of all wages. This time the story is rather different in France. There the share of total wages going to the top percentile was steady at 6 percent until very recently, when it climbed to 7 percent. The recent surge of extreme inequality at the top of the wage distribution may be primarily an American development. Piketty, who with Emmanuel Saez has made a careful study of high-income tax returns in the United States, attributes this to the rise of what he calls “supermanagers.” The very highest income class consists to a substantial extent of top executives of large corporations, with very rich compensation packages. (A disproportionate number of these, but by no means all of them, come from the financial services industry.) With or without stock options, these large pay packages get converted to wealth and future income from wealth. But the fact remains that much of the increased income (and wealth) inequality in the United States is driven by the rise of these supermanagers.

and Deirdre McCloskey (p critical): https://ejpe.org/journal/article/view/170

nice discussion of empirical economics, economic history, market failures and statism, etc., with several bon mots

Piketty’s great splash will undoubtedly bring many young economically interested scholars to devote their lives to the study of the past. That is good, because economic history is one of the few scientifically quantitative branches of economics. In economic history, as in experimental economics and a few other fields, the economists confront the evidence (as they do not for example in most macroeconomics or industrial organization or international trade theory nowadays).

...

Piketty gives a fine example of how to do it. He does not get entangled as so many economists do in the sole empirical tool they are taught, namely, regression analysis on someone else’s “data” (one of the problems is the word data, meaning “things given”: scientists should deal in capta, “things seized”). Therefore he does not commit one of the two sins of modern economics, the use of meaningless “tests” of statistical significance (he occasionally refers to “statistically insignificant” relations between, say, tax rates and growth rates, but I am hoping he does not suppose that a large coefficient is “insignificant” because R. A. Fisher in 1925 said it was). Piketty constructs or uses statistics of aggregate capital and of inequality and then plots them out for inspection, which is what physicists, for example, also do in dealing with their experiments and observations. Nor does he commit the other sin, which is to waste scientific time on existence theorems. Physicists, again, don’t. If we economists are going to persist in physics envy let us at least learn what physicists actually do. Piketty stays close to the facts, and does not, for example, wander into the pointless worlds of non-cooperative game theory, long demolished by experimental economics. He also does not have recourse to non-computable general equilibrium, which never was of use for quantitative economic science, being a branch of philosophy, and a futile one at that. On both points, bravissimo.

...

Since those founding geniuses of classical economics, a market-tested betterment (a locution to be preferred to “capitalism”, with its erroneous implication that capital accumulation, not innovation, is what made us better off) has enormously enriched large parts of a humanity now seven times larger in population than in 1800, and bids fair in the next fifty years or so to enrich everyone on the planet. [Not SSA or MENA...]

...

Then economists, many on the left but some on the right, in quick succession from 1880 to the present—at the same time that market-tested betterment was driving real wages up and up and up—commenced worrying about, to name a few of the pessimisms concerning “capitalism” they discerned: greed, alienation, racial impurity, workers’ lack of bargaining strength, workers’ bad taste in consumption, immigration of lesser breeds, monopoly, unemployment, business cycles, increasing returns, externalities, under-consumption, monopolistic competition, separation of ownership from control, lack of planning, post-War stagnation, investment spillovers, unbalanced growth, dual labor markets, capital insufficiency (William Easterly calls it “capital fundamentalism”), peasant irrationality, capital-market imperfections, public … [more]

april 2017 by nhaliday

Meta-assessment of bias in science

march 2017 by nhaliday

Science is said to be suffering a reproducibility crisis caused by many biases. How common are these problems, across the wide diversity of research fields? We probed for multiple bias-related patterns in a large random sample of meta-analyses taken from all disciplines. The magnitude of these biases varied widely across fields and was on average relatively small. However, we consistently observed that small, early, highly cited studies published in peer-reviewed journals were likely to overestimate effects. We found little evidence that these biases were related to scientific productivity, and we found no difference between biases in male and female researchers. However, a scientist’s early-career status, isolation, and lack of scientific integrity might be significant risk factors for producing unreliable results.

study
academia
science
meta:science
metabuch
stylized-facts
ioannidis
replication
error
incentives
integrity
trends
social-science
meta-analysis
🔬
hypothesis-testing
effect-size
usa
biases
org:nat
info-dynamics
march 2017 by nhaliday

Understanding statistics through interactive visualizations

explanation list visualization gotchas paradox stats methodology hypothesis-testing visual-understanding better-explained links regression-to-mean metabuch examples data-science street-fighting intuition ground-up nitty-gritty

march 2017 by nhaliday

explanation list visualization gotchas paradox stats methodology hypothesis-testing visual-understanding better-explained links regression-to-mean metabuch examples data-science street-fighting intuition ground-up nitty-gritty

march 2017 by nhaliday

Correlation and Causation in the Study of Personality

study essay explanation methodology bio biodet genetics genomics graphical-models graphs causation roots twin-study sib-study pdf gwern 🌞 personality variance-components QTL big-picture behavioral-gen philosophy hypothesis-testing volo-avolo endo-exo article endogenous-exogenous bioinformatics

march 2017 by nhaliday

study essay explanation methodology bio biodet genetics genomics graphical-models graphs causation roots twin-study sib-study pdf gwern 🌞 personality variance-components QTL big-picture behavioral-gen philosophy hypothesis-testing volo-avolo endo-exo article endogenous-exogenous bioinformatics

march 2017 by nhaliday

Information Processing: The joy of Turkheimer

february 2017 by nhaliday

In the talk Turkheimer gives the following definition of social science, which emphasizes why it is hard:

Social science is the attempt to explain the causes of complex human behavior when:

- There are a large number of potential causes.

- The potential causes are non-independent.

- Randomized experimentation is not possible.

hsu
scitariat
genetics
genomics
causation
hypothesis-testing
social-science
nonlinearity
iidness
correlation
links
slides
presentation
audio
things
lens
metabuch
thinking
GxE
commentary
Social science is the attempt to explain the causes of complex human behavior when:

- There are a large number of potential causes.

- The potential causes are non-independent.

- Randomized experimentation is not possible.

february 2017 by nhaliday

probability - Why does a 95% Confidence Interval (CI) not imply a 95% chance of containing the mean? - Cross Validated

february 2017 by nhaliday

The confidence interval is the answer to the request: "Give me an interval that will bracket the true value of the parameter in 100p% of the instances of an experiment that is repeated a large number of times." The credible interval is an answer to the request: "Give me an interval that brackets the true value with probability pp given the particular sample I've actually observed." To be able to answer the latter request, we must first adopt either (a) a new concept of the data generating process or (b) a different concept of the definition of probability itself.

http://stats.stackexchange.com/questions/139290/a-psychology-journal-banned-p-values-and-confidence-intervals-is-it-indeed-wise

PS. Note that my question is not about the ban itself; it is about the suggested approach. I am not asking about frequentist vs. Bayesian inference either. The Editorial is pretty negative about Bayesian methods too; so it is essentially about using statistics vs. not using statistics at all.

wut

http://stats.stackexchange.com/questions/6966/why-continue-to-teach-and-use-hypothesis-testing-when-confidence-intervals-are

http://stats.stackexchange.com/questions/2356/are-there-any-examples-where-bayesian-credible-intervals-are-obviously-inferior

http://stats.stackexchange.com/questions/2272/whats-the-difference-between-a-confidence-interval-and-a-credible-interval

http://stats.stackexchange.com/questions/6652/what-precisely-is-a-confidence-interval

http://stats.stackexchange.com/questions/1164/why-havent-robust-and-resistant-statistics-replaced-classical-techniques/

http://stats.stackexchange.com/questions/16312/what-is-the-difference-between-confidence-intervals-and-hypothesis-testing

http://stats.stackexchange.com/questions/31679/what-is-the-connection-between-credible-regions-and-bayesian-hypothesis-tests

http://stats.stackexchange.com/questions/11609/clarification-on-interpreting-confidence-intervals

http://stats.stackexchange.com/questions/16493/difference-between-confidence-intervals-and-prediction-intervals

q-n-a
overflow
nibble
stats
data-science
science
methodology
concept
confidence
conceptual-vocab
confusion
explanation
thinking
hypothesis-testing
jargon
multi
meta:science
best-practices
error
discussion
bayesian
frequentist
hmm
publishing
intricacy
wut
comparison
motivation
clarity
examples
robust
metabuch
🔬
info-dynamics
reference
grokkability-clarity
http://stats.stackexchange.com/questions/139290/a-psychology-journal-banned-p-values-and-confidence-intervals-is-it-indeed-wise

PS. Note that my question is not about the ban itself; it is about the suggested approach. I am not asking about frequentist vs. Bayesian inference either. The Editorial is pretty negative about Bayesian methods too; so it is essentially about using statistics vs. not using statistics at all.

wut

http://stats.stackexchange.com/questions/6966/why-continue-to-teach-and-use-hypothesis-testing-when-confidence-intervals-are

http://stats.stackexchange.com/questions/2356/are-there-any-examples-where-bayesian-credible-intervals-are-obviously-inferior

http://stats.stackexchange.com/questions/2272/whats-the-difference-between-a-confidence-interval-and-a-credible-interval

http://stats.stackexchange.com/questions/6652/what-precisely-is-a-confidence-interval

http://stats.stackexchange.com/questions/1164/why-havent-robust-and-resistant-statistics-replaced-classical-techniques/

http://stats.stackexchange.com/questions/16312/what-is-the-difference-between-confidence-intervals-and-hypothesis-testing

http://stats.stackexchange.com/questions/31679/what-is-the-connection-between-credible-regions-and-bayesian-hypothesis-tests

http://stats.stackexchange.com/questions/11609/clarification-on-interpreting-confidence-intervals

http://stats.stackexchange.com/questions/16493/difference-between-confidence-intervals-and-prediction-intervals

february 2017 by nhaliday

Measurement error and the replication crisis | Science

february 2017 by nhaliday

In a low-noise setting, the theoretical results of Hausman and others correctly show that measurement error will attenuate coefficient estimates. But we can demonstrate with a simple exercise that the opposite occurs in the presence of high noise and selection on statistical significance.

study
org:nat
science
meta:science
stats
signal-noise
gelman
methodology
hypothesis-testing
replication
social-science
error
metabuch
unit
nibble
bounded-cognition
measurement
🔬
info-dynamics
february 2017 by nhaliday

Simultaneous confidence intervals for multinomial parameters, for small samples, many classes? - Cross Validated

february 2017 by nhaliday

- "Bonferroni approach" is just union bound

- so Pr(|hat p_i - p_i| > ε for any i) <= 2k e^{-ε^2 n} = δ

- ε = sqrt(ln(2k/δ)/n)

- Bonferroni approach should work for case of any dependent Bernoulli r.v.s

q-n-a
overflow
stats
moments
distribution
acm
hypothesis-testing
nibble
confidence
concentration-of-measure
bonferroni
parametric
synchrony
- so Pr(|hat p_i - p_i| > ε for any i) <= 2k e^{-ε^2 n} = δ

- ε = sqrt(ln(2k/δ)/n)

- Bonferroni approach should work for case of any dependent Bernoulli r.v.s

february 2017 by nhaliday

The "What does not kill my statistical significance makes it stronger" fallacy - Statistical Modeling, Causal Inference, and Social Science

scitariat gelman science social-science hypothesis-testing error metabuch thinking replication bounded-cognition meta:science 🔬 info-dynamics

february 2017 by nhaliday

scitariat gelman science social-science hypothesis-testing error metabuch thinking replication bounded-cognition meta:science 🔬 info-dynamics

february 2017 by nhaliday

Odds ratio - Wikipedia

february 2017 by nhaliday

- (P(y=1|x=1) / P(y=0|x=1)) / (P(y=1|x=0) / P(y=0|x=0))

- when P(y=1|x=0) and P(y=1|x=1) are both small, approximately the relative risk = P(y=1|x=1)/P(y=1|x=0)

The two other major ways of quantifying association are the risk ratio ("RR") and the absolute risk reduction ("ARR"). In clinical studies and many other settings, the parameter of greatest interest is often actually the RR, which is determined in a way that is similar to the one just described for the OR, except using probabilities instead of odds. Frequently, however, the available data only allows the computation of the OR; notably, this is so in the case of case-control studies, as explained below. On the other hand, if one of the properties (say, A) is sufficiently rare (the "rare disease assumption"), then the OR of having A given that the individual has B is a good approximation to the corresponding RR (the specification "A given B" is needed because, while the OR treats the two properties symmetrically, the RR and other measures do not).

concept
metrics
methodology
science
hypothesis-testing
wiki
reference
stats
effect-size
- when P(y=1|x=0) and P(y=1|x=1) are both small, approximately the relative risk = P(y=1|x=1)/P(y=1|x=0)

The two other major ways of quantifying association are the risk ratio ("RR") and the absolute risk reduction ("ARR"). In clinical studies and many other settings, the parameter of greatest interest is often actually the RR, which is determined in a way that is similar to the one just described for the OR, except using probabilities instead of odds. Frequently, however, the available data only allows the computation of the OR; notably, this is so in the case of case-control studies, as explained below. On the other hand, if one of the properties (say, A) is sufficiently rare (the "rare disease assumption"), then the OR of having A given that the individual has B is a good approximation to the corresponding RR (the specification "A given B" is needed because, while the OR treats the two properties symmetrically, the RR and other measures do not).

february 2017 by nhaliday

What is the relationship between statistical power and the p-value? - Quora

february 2017 by nhaliday

β(x) = prob. of rejecting null hypothesis (generally, confidence interval not including 0) for true value x

q-n-a
qra
stats
science
replication
methodology
explanation
comparison
confusion
jargon
hypothesis-testing
metrics
nibble
conceptual-vocab
metabuch
stat-power
🔬
february 2017 by nhaliday

interpretation - How to understand degrees of freedom? - Cross Validated

january 2017 by nhaliday

From Wikipedia, there are three interpretations of the degrees of freedom of a statistic:

In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.

Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the estimate of a parameter is called the degrees of freedom (df). In general, the degrees of freedom of an estimate of a parameter is equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself (which, in sample variance, is one, since the sample mean is the only intermediate step).

Mathematically, degrees of freedom is the dimension of the domain of a random vector, or essentially the number of 'free' components: how many components need to be known before the vector is fully determined.

...

This is a subtle question. It takes a thoughtful person not to understand those quotations! Although they are suggestive, it turns out that none of them is exactly or generally correct. I haven't the time (and there isn't the space here) to give a full exposition, but I would like to share one approach and an insight that it suggests.

Where does the concept of degrees of freedom (DF) arise? The contexts in which it's found in elementary treatments are:

- The Student t-test and its variants such as the Welch or Satterthwaite solutions to the Behrens-Fisher problem (where two populations have different variances).

- The Chi-squared distribution (defined as a sum of squares of independent standard Normals), which is implicated in the sampling distribution of the variance.

- The F-test (of ratios of estimated variances).

- The Chi-squared test, comprising its uses in (a) testing for independence in contingency tables and (b) testing for goodness of fit of distributional estimates.

In spirit, these tests run a gamut from being exact (the Student t-test and F-test for Normal variates) to being good approximations (the Student t-test and the Welch/Satterthwaite tests for not-too-badly-skewed data) to being based on asymptotic approximations (the Chi-squared test). An interesting aspect of some of these is the appearance of non-integral "degrees of freedom" (the Welch/Satterthwaite tests and, as we will see, the Chi-squared test). This is of especial interest because it is the first hint that DF is not any of the things claimed of it.

...

Having been alerted by these potential ambiguities, let's hold up the Chi-squared goodness of fit test for examination, because (a) it's simple, (b) it's one of the common situations where people really do need to know about DF to get the p-value right and (c) it's often used incorrectly. Here's a brief synopsis of the least controversial application of this test:

...

This, many authorities tell us, should have (to a very close approximation) a Chi-squared distribution. But there's a whole family of such distributions. They are differentiated by a parameter νν often referred to as the "degrees of freedom." The standard reasoning about how to determine νν goes like this

I have kk counts. That's kk pieces of data. But there are (functional) relationships among them. To start with, I know in advance that the sum of the counts must equal nn. That's one relationship. I estimated two (or pp, generally) parameters from the data. That's two (or pp) additional relationships, giving p+1p+1 total relationships. Presuming they (the parameters) are all (functionally) independent, that leaves only k−p−1k−p−1 (functionally) independent "degrees of freedom": that's the value to use for νν.

The problem with this reasoning (which is the sort of calculation the quotations in the question are hinting at) is that it's wrong except when some special additional conditions hold. Moreover, those conditions have nothing to do with independence (functional or statistical), with numbers of "components" of the data, with the numbers of parameters, nor with anything else referred to in the original question.

...

Things went wrong because I violated two requirements of the Chi-squared test:

1. You must use the Maximum Likelihood estimate of the parameters. (This requirement can, in practice, be slightly violated.)

2. You must base that estimate on the counts, not on the actual data! (This is crucial.)

...

The point of this comparison--which I hope you have seen coming--is that the correct DF to use for computing the p-values depends on many things other than dimensions of manifolds, counts of functional relationships, or the geometry of Normal variates. There is a subtle, delicate interaction between certain functional dependencies, as found in mathematical relationships among quantities, and distributions of the data, their statistics, and the estimators formed from them. Accordingly, it cannot be the case that DF is adequately explainable in terms of the geometry of multivariate normal distributions, or in terms of functional independence, or as counts of parameters, or anything else of this nature.

We are led to see, then, that "degrees of freedom" is merely a heuristic that suggests what the sampling distribution of a (t, Chi-squared, or F) statistic ought to be, but it is not dispositive. Belief that it is dispositive leads to egregious errors. (For instance, the top hit on Google when searching "chi squared goodness of fit" is a Web page from an Ivy League university that gets most of this completely wrong! In particular, a simulation based on its instructions shows that the chi-squared value it recommends as having 7 DF actually has 9 DF.)

q-n-a
overflow
stats
data-science
concept
jargon
explanation
methodology
things
nibble
degrees-of-freedom
clarity
curiosity
manifolds
dimensionality
ground-up
intricacy
hypothesis-testing
examples
list
ML-MAP-E
gotchas
grokkability-clarity
In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.

Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the estimate of a parameter is called the degrees of freedom (df). In general, the degrees of freedom of an estimate of a parameter is equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself (which, in sample variance, is one, since the sample mean is the only intermediate step).

Mathematically, degrees of freedom is the dimension of the domain of a random vector, or essentially the number of 'free' components: how many components need to be known before the vector is fully determined.

...

This is a subtle question. It takes a thoughtful person not to understand those quotations! Although they are suggestive, it turns out that none of them is exactly or generally correct. I haven't the time (and there isn't the space here) to give a full exposition, but I would like to share one approach and an insight that it suggests.

Where does the concept of degrees of freedom (DF) arise? The contexts in which it's found in elementary treatments are:

- The Student t-test and its variants such as the Welch or Satterthwaite solutions to the Behrens-Fisher problem (where two populations have different variances).

- The Chi-squared distribution (defined as a sum of squares of independent standard Normals), which is implicated in the sampling distribution of the variance.

- The F-test (of ratios of estimated variances).

- The Chi-squared test, comprising its uses in (a) testing for independence in contingency tables and (b) testing for goodness of fit of distributional estimates.

In spirit, these tests run a gamut from being exact (the Student t-test and F-test for Normal variates) to being good approximations (the Student t-test and the Welch/Satterthwaite tests for not-too-badly-skewed data) to being based on asymptotic approximations (the Chi-squared test). An interesting aspect of some of these is the appearance of non-integral "degrees of freedom" (the Welch/Satterthwaite tests and, as we will see, the Chi-squared test). This is of especial interest because it is the first hint that DF is not any of the things claimed of it.

...

Having been alerted by these potential ambiguities, let's hold up the Chi-squared goodness of fit test for examination, because (a) it's simple, (b) it's one of the common situations where people really do need to know about DF to get the p-value right and (c) it's often used incorrectly. Here's a brief synopsis of the least controversial application of this test:

...

This, many authorities tell us, should have (to a very close approximation) a Chi-squared distribution. But there's a whole family of such distributions. They are differentiated by a parameter νν often referred to as the "degrees of freedom." The standard reasoning about how to determine νν goes like this

I have kk counts. That's kk pieces of data. But there are (functional) relationships among them. To start with, I know in advance that the sum of the counts must equal nn. That's one relationship. I estimated two (or pp, generally) parameters from the data. That's two (or pp) additional relationships, giving p+1p+1 total relationships. Presuming they (the parameters) are all (functionally) independent, that leaves only k−p−1k−p−1 (functionally) independent "degrees of freedom": that's the value to use for νν.

The problem with this reasoning (which is the sort of calculation the quotations in the question are hinting at) is that it's wrong except when some special additional conditions hold. Moreover, those conditions have nothing to do with independence (functional or statistical), with numbers of "components" of the data, with the numbers of parameters, nor with anything else referred to in the original question.

...

Things went wrong because I violated two requirements of the Chi-squared test:

1. You must use the Maximum Likelihood estimate of the parameters. (This requirement can, in practice, be slightly violated.)

2. You must base that estimate on the counts, not on the actual data! (This is crucial.)

...

The point of this comparison--which I hope you have seen coming--is that the correct DF to use for computing the p-values depends on many things other than dimensions of manifolds, counts of functional relationships, or the geometry of Normal variates. There is a subtle, delicate interaction between certain functional dependencies, as found in mathematical relationships among quantities, and distributions of the data, their statistics, and the estimators formed from them. Accordingly, it cannot be the case that DF is adequately explainable in terms of the geometry of multivariate normal distributions, or in terms of functional independence, or as counts of parameters, or anything else of this nature.

We are led to see, then, that "degrees of freedom" is merely a heuristic that suggests what the sampling distribution of a (t, Chi-squared, or F) statistic ought to be, but it is not dispositive. Belief that it is dispositive leads to egregious errors. (For instance, the top hit on Google when searching "chi squared goodness of fit" is a Web page from an Ivy League university that gets most of this completely wrong! In particular, a simulation based on its instructions shows that the chi-squared value it recommends as having 7 DF actually has 9 DF.)

january 2017 by nhaliday

terminology - What are the major philosophical, methodological, and terminological differences between econometrics and other statistical fields? - Cross Validated

q-n-a overflow stats methodology jargon comparison lens culture economics econometrics bio hypothesis-testing nibble

january 2017 by nhaliday

q-n-a overflow stats methodology jargon comparison lens culture economics econometrics bio hypothesis-testing nibble

january 2017 by nhaliday

D-separation

january 2017 by nhaliday

collider C = A->C<-B

A, B d-connected (resp. conditioned on Z) iff path A~>B or B~>A w/o colliders (resp. path excluding vertices in Z)

A,B d-separated conditioned on Z iff not d-connected conditioned on Z

http://bayes.cs.ucla.edu/BOOK-2K/d-sep.html

concept
explanation
causation
bayesian
graphical-models
cmu
org:edu
stats
methodology
tutorial
jargon
graphs
hypothesis-testing
confounding
🔬
direct-indirect
philosophy
definition
volo-avolo
multi
org:junk
A, B d-connected (resp. conditioned on Z) iff path A~>B or B~>A w/o colliders (resp. path excluding vertices in Z)

A,B d-separated conditioned on Z iff not d-connected conditioned on Z

http://bayes.cs.ucla.edu/BOOK-2K/d-sep.html

january 2017 by nhaliday

Improving Economic Research | askblog

january 2017 by nhaliday

To make a long story short:

1. Economic phenomena are rife with causal density. Theories make predictions assuming “other things equal,” but other things are never equal.

2. When I was a student, the solution was thought to be multiple regression analysis. You entered a bunch of variables into an estimated equation, and in doing so you “controlled for” those variables and thereby created conditions of “other things equal.” However, in 1978, Edward Leamer pointed out that actual practice diverges from theory. The researcher typically undertakes a lot of exploratory data analysis before reporting a final result. This process of exploratory analysis creates a bias toward finding the result desired by the researcher, rather than achieving a scientific ideal of objectivity.

3. In recent decades, the approach has shifted toward “natural experiments” and laboratory experiments. These suffer from other problems. The experimental population may not be representative. Even if this problem is not present, studies that offer definitive results are more likely to be published but consequently less likely to be replicated.

econotariat
cracker-econ
study
summary
methodology
economics
causation
social-science
best-practices
academia
hypothesis-testing
thick-thin
density
replication
complex-systems
roots
noise-structure
endo-exo
info-dynamics
natural-experiment
endogenous-exogenous
1. Economic phenomena are rife with causal density. Theories make predictions assuming “other things equal,” but other things are never equal.

2. When I was a student, the solution was thought to be multiple regression analysis. You entered a bunch of variables into an estimated equation, and in doing so you “controlled for” those variables and thereby created conditions of “other things equal.” However, in 1978, Edward Leamer pointed out that actual practice diverges from theory. The researcher typically undertakes a lot of exploratory data analysis before reporting a final result. This process of exploratory analysis creates a bias toward finding the result desired by the researcher, rather than achieving a scientific ideal of objectivity.

3. In recent decades, the approach has shifted toward “natural experiments” and laboratory experiments. These suffer from other problems. The experimental population may not be representative. Even if this problem is not present, studies that offer definitive results are more likely to be published but consequently less likely to be replicated.

january 2017 by nhaliday

Resetting the bar: Statistical significance in whole-genome sequencing-based association studies of global populations - Pulit - 2016 - Genetic Epidemiology - Wiley Online Library

study scaling-up GWAS genetics genomics methodology simulation objektbuch biodet hypothesis-testing

december 2016 by nhaliday

study scaling-up GWAS genetics genomics methodology simulation objektbuch biodet hypothesis-testing

december 2016 by nhaliday

The 20% Statistician: Why Type 1 errors are more important than Type 2 errors (if you care about evidence)

december 2016 by nhaliday

as reminder: type I = false positive, type II = false negative

science
replication
methodology
advice
error
stats
best-practices
scitariat
meta:science
hypothesis-testing
nibble
conceptual-vocab
info-dynamics
december 2016 by nhaliday

Adaptive data analysis

acmtariat acm machine-learning stats research research-program exposition science methodology mrtz meta:science differential-privacy liner-notes hypothesis-testing org:bleg nibble metameta 🔬 info-dynamics generalization iteration-recursion data-science online-learning bayesian gelman scitariat frequentist human-ml robust perturbation sensitivity learning-theory information-theory bits lower-bounds no-go volo-avolo adversarial gradient-descent bonferroni

december 2016 by nhaliday

acmtariat acm machine-learning stats research research-program exposition science methodology mrtz meta:science differential-privacy liner-notes hypothesis-testing org:bleg nibble metameta 🔬 info-dynamics generalization iteration-recursion data-science online-learning bayesian gelman scitariat frequentist human-ml robust perturbation sensitivity learning-theory information-theory bits lower-bounds no-go volo-avolo adversarial gradient-descent bonferroni

december 2016 by nhaliday

Important z-scores

november 2016 by nhaliday

hmm:

https://twitter.com/davidshor/status/888441370247090176

https://archive.is/BiPxf

Friends don't let friends plot 95% confidence intervals

There is a 98.3% chance that b>a, even through their 95% oonfsdenoe intervals overlap

stats
objektbuch
street-fighting
reference
concentration-of-measure
mental-math
nitty-gritty
multi
twitter
social
discussion
journos-pundits
dataviz
counterexample
data-science
moments
left-wing
backup
pic
confidence
intersection
hypothesis-testing
intersection-connectedness
https://twitter.com/davidshor/status/888441370247090176

https://archive.is/BiPxf

Friends don't let friends plot 95% confidence intervals

There is a 98.3% chance that b>a, even through their 95% oonfsdenoe intervals overlap

november 2016 by nhaliday

Copy this bookmark: