Thread by @docmilanfar: (1/5) One of the most surprising and little-known results in classical statistics is the relationship between the mean, median, and standard…

2 days ago by nhaliday

(1/5) One of the most surprising and little-known results in classical statistics is the relationship between the mean, median, and standard deviation. If the distribution has finite variance, then the distance between the median and the mean is bounded by one standard deviation.

twitter
social
discussion
levers
tidbits
math
probability
stats
mental-math
calculation
applications
meta-analysis
hypothesis-testing
science
moments
expectancy
proofs
convexity-curvature
estimate
2 days ago by nhaliday

Ask HN: What's your speciality, and what's your "FizzBuzz" equivalent? | Hacker News

hn discussion q-n-a tech programming recruiting checking short-circuit analogy lens init ground-up interdisciplinary cs IEEE electromag math probability finance ORFE marketing dbs audio writing data-science stats hypothesis-testing devops debugging security networking web frontend javascript chemistry gedanken examples fourier acm linear-algebra matrix-factorization iterative-methods embedded multi human-capital

november 2019 by nhaliday

hn discussion q-n-a tech programming recruiting checking short-circuit analogy lens init ground-up interdisciplinary cs IEEE electromag math probability finance ORFE marketing dbs audio writing data-science stats hypothesis-testing devops debugging security networking web frontend javascript chemistry gedanken examples fourier acm linear-algebra matrix-factorization iterative-methods embedded multi human-capital

november 2019 by nhaliday

Is there a common method for detecting the convergence of the Gibbs sampler and the expectation-maximization algorithm? - Quora

october 2019 by nhaliday

In practice and theory it is much easier to diagnose convergence in EM (vanilla or variational) than in any MCMC algorithm (including Gibbs sampling).

https://www.quora.com/How-can-you-determine-if-your-Gibbs-sampler-has-converged

There is a special case when you can actually obtain the stationary distribution, and be sure that you did! If your markov chain consists of a discrete state space, then take the first time that a state repeats in your chain: if you randomly sample an element between the repeating states (but only including one of the endpoints) you will have a sample from your true distribution.

One can achieve this 'exact MCMC sampling' more generally by using the coupling from the past algorithm (Coupling from the past).

Otherwise, there is no rigorous statistical test for convergence. It may be possible to obtain a theoretical bound for the convergence rates: but these are quite difficult to obtain, and quite often too large to be of practical use. For example, even for the simple case of using the Metropolis algorithm for sampling from a two-dimensional uniform distribution, the best convergence rate upper bound achieved, by Persi Diaconis, was something with an astronomical constant factor like 10^300.

In fact, it is fair to say that for most high dimensional problems, we have really no idea whether Gibbs sampling ever comes close to converging, but the best we can do is use some simple diagnostics to detect the most obvious failures.

nibble
q-n-a
qra
acm
stats
probability
limits
convergence
distribution
sampling
markov
monte-carlo
ML-MAP-E
checking
equilibrium
stylized-facts
gelman
levers
mixing
empirical
plots
manifolds
multi
fixed-point
iteration-recursion
heuristic
expert-experience
theory-practice
project
https://www.quora.com/How-can-you-determine-if-your-Gibbs-sampler-has-converged

There is a special case when you can actually obtain the stationary distribution, and be sure that you did! If your markov chain consists of a discrete state space, then take the first time that a state repeats in your chain: if you randomly sample an element between the repeating states (but only including one of the endpoints) you will have a sample from your true distribution.

One can achieve this 'exact MCMC sampling' more generally by using the coupling from the past algorithm (Coupling from the past).

Otherwise, there is no rigorous statistical test for convergence. It may be possible to obtain a theoretical bound for the convergence rates: but these are quite difficult to obtain, and quite often too large to be of practical use. For example, even for the simple case of using the Metropolis algorithm for sampling from a two-dimensional uniform distribution, the best convergence rate upper bound achieved, by Persi Diaconis, was something with an astronomical constant factor like 10^300.

In fact, it is fair to say that for most high dimensional problems, we have really no idea whether Gibbs sampling ever comes close to converging, but the best we can do is use some simple diagnostics to detect the most obvious failures.

october 2019 by nhaliday

Evidence-based Software Engineering: based on the publicly available data

unit books software programming engineering data analysis evidence-based empirical data-science r-lang summary survey meta-analysis metrics best-practices business tech google stats methodology regression time-series code-organizing grokkability grokkability-clarity project-management

july 2019 by nhaliday

unit books software programming engineering data analysis evidence-based empirical data-science r-lang summary survey meta-analysis metrics best-practices business tech google stats methodology regression time-series code-organizing grokkability grokkability-clarity project-management

july 2019 by nhaliday

Links - Gwern.net

june 2019 by nhaliday

“‘I don’t speak’, Bijaz said. ‘I operate a machine called language. It creaks and groans, but is mine own.’”

- Frank Herbert, Dune Messiah

I love this quote

ratty
gwern
links
list
summary
people
profile
virginia-DC
quotes
aphorism
lesswrong
social
media
reddit
hn
books
aggregator
prediction
priors-posteriors
vulgar
tv
wiki
internet
haskell
workflow
exocortex
linux
editors
browser
retention
software
hardware
notetaking
desktop
terminal
duplication
backup
sleep
tools
privacy
advertising
keyboard
ergo
deep-learning
stats
bayesian
reinforcement
consumerism
money
review
yak-shaving
computer-memory
mooc
personality
iq
psych-architecture
creative
open-closed
discipline
extra-introversion
stress
quiz
philosophy
morality
ethics
formal-values
sanctity-degradation
politics
coalitions
things
psychometrics
education
programming
oss
culture
rationality
heuristic
biases
collaboration
config
- Frank Herbert, Dune Messiah

I love this quote

june 2019 by nhaliday

A list of open source C++ libraries - cppreference.com

documentation list links libraries c(pp) top-n recommendations graphics systems concurrency facebook networking computer-vision stats numerics data-science machine-learning deep-learning linear-models model-class checking dbs crypto oss pls programming sci-comp interface ecosystem protocol-metadata interface-compatibility

april 2019 by nhaliday

documentation list links libraries c(pp) top-n recommendations graphics systems concurrency facebook networking computer-vision stats numerics data-science machine-learning deep-learning linear-models model-class checking dbs crypto oss pls programming sci-comp interface ecosystem protocol-metadata interface-compatibility

april 2019 by nhaliday

8 PCA – A Powerful Method for Analyze Ecological Niches

august 2018 by nhaliday

Influences of ecology and biogeography on shaping the distributions of cryptic species: three bat tales in Iberia: https://academic.oup.com/biolinnean/article/112/1/150/2415750

Combining Historical Biogeography with Niche Modeling in theCaprifoliumClade ofLonicera(Caprifoliaceae, Dipsacales): https://watermark.silverchair.com/syq011.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAagwggGkBgkqhkiG9w0BBwagggGVMIIBkQIBADCCAYoGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMnQcew1QnnjkjJSlVAgEQgIIBW-Nu-4L3xpOdRIb27NdbMbhPjaeByMM3g6H1bpeMMK4OJ9gBOH7V5WfuKGlHlsgsStQQLC_s2YGVu5KDOtwhudWOPFqrXmYlAXjhFNi5hFNpCxjNT-4tTJlRJHU5plgPE2BWZht5okuM2sngjX3t5dDScmz0oTBvu7xnUXo3sbGkad6gw-za6Rpyl5_3-nnnbOpz6WeqfxcR7NDGwPd741QVJKjjp-FHPf8JdWN3mcsLMVJ6p11FoeMeQdA7gsyXhKDPfE8sJ2Xamjxk5uSaGkfi1bi71OB1Ag0UvV2xlON1UwWD9V8tE7e3JJQanv_aKgKyppuXQikoMhH05x_nCFsiVif-_-26Yyx0CMIHv4so81sOpwN5YM_BISyUp_RoT2yfjiEhZpcJlyWX4z6ZeKAUEICloT8evsOX8Ll4FUocBHARhnqZgRlc8w33b_J3wslXv-PVBvvXNs0h

pdf
article
study
methodology
bio
ecology
data
analysis
stats
exploratory
matrix-factorization
geography
environment
time
crosstab
history
letters
correlation
evolution
distribution
examples
high-dimension
multi
chart
howto
objektbuch
metabuch
nibble
data-science
things
Combining Historical Biogeography with Niche Modeling in theCaprifoliumClade ofLonicera(Caprifoliaceae, Dipsacales): https://watermark.silverchair.com/syq011.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAagwggGkBgkqhkiG9w0BBwagggGVMIIBkQIBADCCAYoGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMnQcew1QnnjkjJSlVAgEQgIIBW-Nu-4L3xpOdRIb27NdbMbhPjaeByMM3g6H1bpeMMK4OJ9gBOH7V5WfuKGlHlsgsStQQLC_s2YGVu5KDOtwhudWOPFqrXmYlAXjhFNi5hFNpCxjNT-4tTJlRJHU5plgPE2BWZht5okuM2sngjX3t5dDScmz0oTBvu7xnUXo3sbGkad6gw-za6Rpyl5_3-nnnbOpz6WeqfxcR7NDGwPd741QVJKjjp-FHPf8JdWN3mcsLMVJ6p11FoeMeQdA7gsyXhKDPfE8sJ2Xamjxk5uSaGkfi1bi71OB1Ag0UvV2xlON1UwWD9V8tE7e3JJQanv_aKgKyppuXQikoMhH05x_nCFsiVif-_-26Yyx0CMIHv4so81sOpwN5YM_BISyUp_RoT2yfjiEhZpcJlyWX4z6ZeKAUEICloT8evsOX8Ll4FUocBHARhnqZgRlc8w33b_J3wslXv-PVBvvXNs0h

august 2018 by nhaliday

Stein's example - Wikipedia

february 2018 by nhaliday

Stein's example (or phenomenon or paradox), in decision theory and estimation theory, is the phenomenon that when three or more parameters are estimated simultaneously, there exist combined estimators more accurate on average (that is, having lower expected mean squared error) than any method that handles the parameters separately. It is named after Charles Stein of Stanford University, who discovered the phenomenon in 1955.[1]

An intuitive explanation is that optimizing for the mean-squared error of a combined estimator is not the same as optimizing for the errors of separate estimators of the individual parameters. In practical terms, if the combined error is in fact of interest, then a combined estimator should be used, even if the underlying parameters are independent; this occurs in channel estimation in telecommunications, for instance (different factors affect overall channel performance). On the other hand, if one is instead interested in estimating an individual parameter, then using a combined estimator does not help and is in fact worse.

...

Many simple, practical estimators achieve better performance than the ordinary estimator. The best-known example is the James–Stein estimator, which works by starting at X and moving towards a particular point (such as the origin) by an amount inversely proportional to the distance of X from that point.

nibble
concept
levers
wiki
reference
acm
stats
probability
decision-theory
estimate
distribution
atoms
An intuitive explanation is that optimizing for the mean-squared error of a combined estimator is not the same as optimizing for the errors of separate estimators of the individual parameters. In practical terms, if the combined error is in fact of interest, then a combined estimator should be used, even if the underlying parameters are independent; this occurs in channel estimation in telecommunications, for instance (different factors affect overall channel performance). On the other hand, if one is instead interested in estimating an individual parameter, then using a combined estimator does not help and is in fact worse.

...

Many simple, practical estimators achieve better performance than the ordinary estimator. The best-known example is the James–Stein estimator, which works by starting at X and moving towards a particular point (such as the origin) by an amount inversely proportional to the distance of X from that point.

february 2018 by nhaliday

The Gelman View – spottedtoad

november 2017 by nhaliday

I have read Andrew Gelman’s blog for about five years, and gradually, I’ve decided that among his many blog posts and hundreds of academic articles, he is advancing a philosophy not just of statistics but of quantitative social science in general. Not a statistician myself, here is how I would articulate the Gelman View:

A. Purposes

1. The purpose of social statistics is to describe and understand variation in the world. The world is a complicated place, and we shouldn’t expect things to be simple.

2. The purpose of scientific publication is to allow for communication, dialogue, and critique, not to “certify” a specific finding as absolute truth.

3. The incentive structure of science needs to reward attempts to independently investigate, reproduce, and refute existing claims and observed patterns, not just to advance new hypotheses or support a particular research agenda.

B. Approach

1. Because the world is complicated, the most valuable statistical models for the world will generally be complicated. The result of statistical investigations will only rarely be to give a stamp of truth on a specific effect or causal claim, but will generally show variation in effects and outcomes.

2. Whenever possible, the data, analytic approach, and methods should be made as transparent and replicable as possible, and should be fair game for anyone to examine, critique, or amend.

3. Social scientists should look to build upon a broad shared body of knowledge, not to “own” a particular intervention, theoretic framework, or technique. Such ownership creates incentive problems when the intervention, framework, or technique fail and the scientist is left trying to support a flawed structure.

Components

1. Measurement. How and what we measure is the first question, well before we decide on what the effects are or what is making that measurement change.

2. Sampling. Who we talk to or collect information from always matters, because we should always expect effects to depend on context.

3. Inference. While models should usually be complex, our inferential framework should be simple enough for anyone to follow along. And no p values.

He might disagree with all of this, or how it reflects his understanding of his own work. But I think it is a valuable guide to empirical work.

ratty
unaffiliated
summary
gelman
scitariat
philosophy
lens
stats
hypothesis-testing
science
meta:science
social-science
institutions
truth
is-ought
best-practices
data-science
info-dynamics
alt-inst
academia
empirical
evidence-based
checklists
strategy
epistemic
A. Purposes

1. The purpose of social statistics is to describe and understand variation in the world. The world is a complicated place, and we shouldn’t expect things to be simple.

2. The purpose of scientific publication is to allow for communication, dialogue, and critique, not to “certify” a specific finding as absolute truth.

3. The incentive structure of science needs to reward attempts to independently investigate, reproduce, and refute existing claims and observed patterns, not just to advance new hypotheses or support a particular research agenda.

B. Approach

1. Because the world is complicated, the most valuable statistical models for the world will generally be complicated. The result of statistical investigations will only rarely be to give a stamp of truth on a specific effect or causal claim, but will generally show variation in effects and outcomes.

2. Whenever possible, the data, analytic approach, and methods should be made as transparent and replicable as possible, and should be fair game for anyone to examine, critique, or amend.

3. Social scientists should look to build upon a broad shared body of knowledge, not to “own” a particular intervention, theoretic framework, or technique. Such ownership creates incentive problems when the intervention, framework, or technique fail and the scientist is left trying to support a flawed structure.

Components

1. Measurement. How and what we measure is the first question, well before we decide on what the effects are or what is making that measurement change.

2. Sampling. Who we talk to or collect information from always matters, because we should always expect effects to depend on context.

3. Inference. While models should usually be complex, our inferential framework should be simple enough for anyone to follow along. And no p values.

He might disagree with all of this, or how it reflects his understanding of his own work. But I think it is a valuable guide to empirical work.

november 2017 by nhaliday

Use and Interpretation of LD Score Regression

november 2017 by nhaliday

LD Score regression distinguishes confounding from polygenicity in genome-wide association studies: https://sci-hub.bz/10.1038/ng.3211

- Po-Ru Loh, Nick Patterson, et al.

https://www.biorxiv.org/content/biorxiv/early/2014/02/21/002931.full.pdf

Both polygenicity (i.e. many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield inflated distributions of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from bias and true signal from polygenicity. We have developed an approach that quantifies the contributions of each by examining the relationship between test statistics and linkage disequilibrium (LD). We term this approach LD Score regression. LD Score regression provides an upper bound on the contribution of confounding bias to the observed inflation in test statistics and can be used to estimate a more powerful correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of test statistic inflation in many GWAS of large sample size.

Supplementary Note: https://images.nature.com/original/nature-assets/ng/journal/v47/n3/extref/ng.3211-S1.pdf

An atlas of genetic correlations across human diseases

and traits: https://sci-hub.bz/10.1038/ng.3406

https://www.biorxiv.org/content/early/2015/01/27/014498.full.pdf

Supplementary Note: https://images.nature.com/original/nature-assets/ng/journal/v47/n11/extref/ng.3406-S1.pdf

https://github.com/bulik/ldsc

ldsc is a command line tool for estimating heritability and genetic correlation from GWAS summary statistics. ldsc also computes LD Scores.

nibble
pdf
slides
talks
bio
biodet
genetics
genomics
GWAS
genetic-correlation
correlation
methodology
bioinformatics
concept
levers
🌞
tutorial
explanation
pop-structure
gene-drift
ideas
multi
study
org:nat
article
repo
software
tools
libraries
stats
hypothesis-testing
biases
confounding
gotchas
QTL
simulation
survey
preprint
population-genetics
- Po-Ru Loh, Nick Patterson, et al.

https://www.biorxiv.org/content/biorxiv/early/2014/02/21/002931.full.pdf

Both polygenicity (i.e. many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield inflated distributions of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from bias and true signal from polygenicity. We have developed an approach that quantifies the contributions of each by examining the relationship between test statistics and linkage disequilibrium (LD). We term this approach LD Score regression. LD Score regression provides an upper bound on the contribution of confounding bias to the observed inflation in test statistics and can be used to estimate a more powerful correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of test statistic inflation in many GWAS of large sample size.

Supplementary Note: https://images.nature.com/original/nature-assets/ng/journal/v47/n3/extref/ng.3211-S1.pdf

An atlas of genetic correlations across human diseases

and traits: https://sci-hub.bz/10.1038/ng.3406

https://www.biorxiv.org/content/early/2015/01/27/014498.full.pdf

Supplementary Note: https://images.nature.com/original/nature-assets/ng/journal/v47/n11/extref/ng.3406-S1.pdf

https://github.com/bulik/ldsc

ldsc is a command line tool for estimating heritability and genetic correlation from GWAS summary statistics. ldsc also computes LD Scores.

november 2017 by nhaliday

Fitting a Structural Equation Model

november 2017 by nhaliday

seems rather unrigorous: nonlinear optimization, possibility of nonconvergence, doesn't even mention local vs. global optimality...

pdf
slides
lectures
acm
stats
hypothesis-testing
graphs
graphical-models
latent-variables
model-class
optimization
nonlinearity
gotchas
nibble
ML-MAP-E
iteration-recursion
convergence
november 2017 by nhaliday

Analytic approaches to twin data using structural equation models

pdf study article explanation methodology variance-components biodet behavioral-gen twin-study genetics population-genetics models model-class graphs graphical-models latent-variables ML-MAP-E stats hypothesis-testing nibble 🌞 correlation bioinformatics acm GxE assortative-mating stat-power confidence

november 2017 by nhaliday

pdf study article explanation methodology variance-components biodet behavioral-gen twin-study genetics population-genetics models model-class graphs graphical-models latent-variables ML-MAP-E stats hypothesis-testing nibble 🌞 correlation bioinformatics acm GxE assortative-mating stat-power confidence

november 2017 by nhaliday

GCTA: a tool for genome-wide complex trait analysis. - PubMed - NCBI

study nibble bio biodet genetics genomics bioinformatics methodology variance-components missing-heritability classic 🌞 population-genetics QTL scaling-up article GCTA spearhead pdf piracy stats ML-MAP-E concept levers ideas

november 2017 by nhaliday

study nibble bio biodet genetics genomics bioinformatics methodology variance-components missing-heritability classic 🌞 population-genetics QTL scaling-up article GCTA spearhead pdf piracy stats ML-MAP-E concept levers ideas

november 2017 by nhaliday

references - Mathematician wants the equivalent knowledge to a quality stats degree - Cross Validated

nibble q-n-a overflow lens acm stats hypothesis-testing limits confluence books recommendations list top-n accretion data-science roadmap p:whenever p:someday reading quixotic advanced markov monte-carlo convexity-curvature optimization topics linear-models linear-algebra machine-learning classification random rand-approx martingale regression time-series no-go

november 2017 by nhaliday

nibble q-n-a overflow lens acm stats hypothesis-testing limits confluence books recommendations list top-n accretion data-science roadmap p:whenever p:someday reading quixotic advanced markov monte-carlo convexity-curvature optimization topics linear-models linear-algebra machine-learning classification random rand-approx martingale regression time-series no-go

november 2017 by nhaliday

Two-Sample Hypothesis Tests for Differences in ... - Data @ Quora - Quora

techtariat quora qra project data-science engineering methodology stats hypothesis-testing distribution expectancy limits concentration-of-measure probability orders acm comparison magnitude time-complexity performance parametric nonparametric org:com

november 2017 by nhaliday

techtariat quora qra project data-science engineering methodology stats hypothesis-testing distribution expectancy limits concentration-of-measure probability orders acm comparison magnitude time-complexity performance parametric nonparametric org:com

november 2017 by nhaliday

multivariate analysis - Is it possible to have a pair of Gaussian random variables for which the joint distribution is not Gaussian? - Cross Validated

october 2017 by nhaliday

The bivariate normal distribution is the exception, not the rule!

It is important to recognize that "almost all" joint distributions with normal marginals are not the bivariate normal distribution. That is, the common viewpoint that joint distributions with normal marginals that are not the bivariate normal are somehow "pathological", is a bit misguided.

Certainly, the multivariate normal is extremely important due to its stability under linear transformations, and so receives the bulk of attention in applications.

note: there is a multivariate central limit theorem, so those such applications have no problem

nibble
q-n-a
overflow
stats
math
acm
probability
distribution
gotchas
intricacy
characterization
structure
composition-decomposition
counterexample
limits
concentration-of-measure
It is important to recognize that "almost all" joint distributions with normal marginals are not the bivariate normal distribution. That is, the common viewpoint that joint distributions with normal marginals that are not the bivariate normal are somehow "pathological", is a bit misguided.

Certainly, the multivariate normal is extremely important due to its stability under linear transformations, and so receives the bulk of attention in applications.

note: there is a multivariate central limit theorem, so those such applications have no problem

october 2017 by nhaliday

Karl Pearson and the Chi-squared Test

october 2017 by nhaliday

Pearson's paper of 1900 introduced what subsequently became known as the chi-squared test of goodness of fit. The terminology and allusions of 80 years ago create a barrier for the modern reader, who finds that the interpretation of Pearson's test procedure and the assessment of what he achieved are less than straightforward, notwithstanding the technical advances made since then. An attempt is made here to surmount these difficulties by exploring Pearson's relevant activities during the first decade of his statistical career, and by describing the work by his contemporaries and predecessors which seem to have influenced his approach to the problem. Not all the questions are answered, and others remain for further study.

original paper: http://www.economics.soton.ac.uk/staff/aldrich/1900.pdf

How did Karl Pearson come up with the chi-squared statistic?: https://stats.stackexchange.com/questions/97604/how-did-karl-pearson-come-up-with-the-chi-squared-statistic

He proceeds by working with the multivariate normal, and the chi-square arises as a sum of squared standardized normal variates.

You can see from the discussion on p160-161 he's clearly discussing applying the test to multinomial distributed data (I don't think he uses that term anywhere). He apparently understands the approximate multivariate normality of the multinomial (certainly he knows the margins are approximately normal - that's a very old result - and knows the means, variances and covariances, since they're stated in the paper); my guess is that most of that stuff is already old hat by 1900. (Note that the chi-squared distribution itself dates back to work by Helmert in the mid-1870s.)

Then by the bottom of p163 he derives a chi-square statistic as "a measure of goodness of fit" (the statistic itself appears in the exponent of the multivariate normal approximation).

He then goes on to discuss how to evaluate the p-value*, and then he correctly gives the upper tail area of a χ212χ122 beyond 43.87 as 0.000016. [You should keep in mind, however, that he didn't correctly understand how to adjust degrees of freedom for parameter estimation at that stage, so some of the examples in his papers use too high a d.f.]

nibble
papers
acm
stats
hypothesis-testing
methodology
history
mostly-modern
pre-ww2
old-anglo
giants
science
the-trenches
stories
multi
q-n-a
overflow
explanation
summary
innovation
discovery
distribution
degrees-of-freedom
limits
original paper: http://www.economics.soton.ac.uk/staff/aldrich/1900.pdf

How did Karl Pearson come up with the chi-squared statistic?: https://stats.stackexchange.com/questions/97604/how-did-karl-pearson-come-up-with-the-chi-squared-statistic

He proceeds by working with the multivariate normal, and the chi-square arises as a sum of squared standardized normal variates.

You can see from the discussion on p160-161 he's clearly discussing applying the test to multinomial distributed data (I don't think he uses that term anywhere). He apparently understands the approximate multivariate normality of the multinomial (certainly he knows the margins are approximately normal - that's a very old result - and knows the means, variances and covariances, since they're stated in the paper); my guess is that most of that stuff is already old hat by 1900. (Note that the chi-squared distribution itself dates back to work by Helmert in the mid-1870s.)

Then by the bottom of p163 he derives a chi-square statistic as "a measure of goodness of fit" (the statistic itself appears in the exponent of the multivariate normal approximation).

He then goes on to discuss how to evaluate the p-value*, and then he correctly gives the upper tail area of a χ212χ122 beyond 43.87 as 0.000016. [You should keep in mind, however, that he didn't correctly understand how to adjust degrees of freedom for parameter estimation at that stage, so some of the examples in his papers use too high a d.f.]

october 2017 by nhaliday

Section 10 Chi-squared goodness-of-fit test.

october 2017 by nhaliday

- pf that chi-squared statistic for Pearson's test (multinomial goodness-of-fit) actually has chi-squared distribution asymptotically

- the gotcha: terms Z_j in sum aren't independent

- solution:

- compute the covariance matrix of the terms to be E[Z_iZ_j] = -sqrt(p_ip_j)

- note that an equivalent way of sampling the Z_j is to take a random standard Gaussian and project onto the plane orthogonal to (sqrt(p_1), sqrt(p_2), ..., sqrt(p_r))

- that is equivalent to just sampling a Gaussian w/ 1 less dimension (hence df=r-1)

QED

pdf
nibble
lecture-notes
mit
stats
hypothesis-testing
acm
probability
methodology
proofs
iidness
distribution
limits
identity
direction
lifts-projections
- the gotcha: terms Z_j in sum aren't independent

- solution:

- compute the covariance matrix of the terms to be E[Z_iZ_j] = -sqrt(p_ip_j)

- note that an equivalent way of sampling the Z_j is to take a random standard Gaussian and project onto the plane orthogonal to (sqrt(p_1), sqrt(p_2), ..., sqrt(p_r))

- that is equivalent to just sampling a Gaussian w/ 1 less dimension (hence df=r-1)

QED

october 2017 by nhaliday

self study - Looking for a good and complete probability and statistics book - Cross Validated

october 2017 by nhaliday

I never had the opportunity to visit a stats course from a math faculty. I am looking for a probability theory and statistics book that is complete and self-sufficient. By complete I mean that it contains all the proofs and not just states results.

nibble
q-n-a
overflow
data-science
stats
methodology
books
recommendations
list
top-n
confluence
proofs
rigor
reference
accretion
october 2017 by nhaliday

Timofey Pnin on Twitter: "I like this example of moderator analysis from Hunter & Schmidt's meta-analysis book. 30 small studies of corrs b/w employees' job satisfact… https://t.co/rgoqP6HzPQ"

october 2017 by nhaliday

I think I follow pretty smart people but I see these small sample studies on my timeline all the time. Remember people, the law of large numbers is a true theorem but the law of small numbers is a joke by Tversky & Kahneman:

gnon
unaffiliated
twitter
social
discussion
thinking
metabuch
science
meta:science
realness
signal-noise
magnitude
scale
measurement
evidence-based
stat-power
hypothesis-testing
methodology
stats
data-science
critique
counterexample
meta-analysis
books
recommendations
confidence
october 2017 by nhaliday

correlation - Variance of product of dependent variables - Cross Validated

october 2017 by nhaliday

cov[X^2,Y^2] + (var[X]+(E[X])^2)(var[Y]+(E[Y])^2) − (cov[X,Y]+E[X]E[Y])^2

nibble
q-n-a
overflow
math
stats
probability
identity
arrows
multiplicative
iidness
moments
dependence-independence
october 2017 by nhaliday

Variance of product of multiple random variables - Cross Validated

october 2017 by nhaliday

prod_i (var[X_i] + (E[X_i])^2) - prod_i (E[X_i])^2

two variable case: var[X] var[Y] + var[X] (E[Y])^2 + (E[X])^2 var[Y]

nibble
q-n-a
overflow
stats
probability
math
identity
moments
arrows
multiplicative
iidness
dependence-independence
two variable case: var[X] var[Y] + var[X] (E[Y])^2 + (E[X])^2 var[Y]

october 2017 by nhaliday

Accurate Genomic Prediction Of Human Height | bioRxiv

september 2017 by nhaliday

Stephen Hsu's compressed sensing application paper

We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ~40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ~0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction.

https://infoproc.blogspot.com/2017/09/accurate-genomic-prediction-of-human.html

http://infoproc.blogspot.com/2017/11/23andme.html

I'm in Mountain View to give a talk at 23andMe. Their latest funding round was $250M on a (reported) valuation of $1.5B. If I just add up the Crunchbase numbers it looks like almost half a billion invested at this point...

Slides: Genomic Prediction of Complex Traits

Here's how people + robots handle your spit sample to produce a SNP genotype:

https://drive.google.com/file/d/1e_zuIPJr1hgQupYAxkcbgEVxmrDHAYRj/view

study
bio
preprint
GWAS
state-of-art
embodied
genetics
genomics
compressed-sensing
high-dimension
machine-learning
missing-heritability
hsu
scitariat
education
🌞
frontier
britain
regression
data
visualization
correlation
phase-transition
multi
commentary
summary
pdf
slides
brands
skunkworks
hard-tech
presentation
talks
methodology
intricacy
bioinformatics
scaling-up
stat-power
sparsity
norms
nibble
speedometer
stats
linear-models
2017
biodet
We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ~40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ~0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction.

https://infoproc.blogspot.com/2017/09/accurate-genomic-prediction-of-human.html

http://infoproc.blogspot.com/2017/11/23andme.html

I'm in Mountain View to give a talk at 23andMe. Their latest funding round was $250M on a (reported) valuation of $1.5B. If I just add up the Crunchbase numbers it looks like almost half a billion invested at this point...

Slides: Genomic Prediction of Complex Traits

Here's how people + robots handle your spit sample to produce a SNP genotype:

https://drive.google.com/file/d/1e_zuIPJr1hgQupYAxkcbgEVxmrDHAYRj/view

september 2017 by nhaliday

Lecture 14: When's that meteor arriving

september 2017 by nhaliday

- Meteors as a random process

- Limiting approximations

- Derivation of the Exponential distribution

- Derivation of the Poisson distribution

- A "Poisson process"

nibble
org:junk
org:edu
exposition
lecture-notes
physics
mechanics
space
earth
probability
stats
distribution
stochastic-processes
closure
additive
limits
approximation
tidbits
acm
binomial
multiplicative
- Limiting approximations

- Derivation of the Exponential distribution

- Derivation of the Poisson distribution

- A "Poisson process"

september 2017 by nhaliday

Quantitative Empirical Methods Reading List | Department of Political Science

org:edu books recommendations list top-n confluence social-science methodology stats data-science causation endo-exo empirical experiment regression sociology polisci reading measurement endogenous-exogenous quixotic

september 2017 by nhaliday

org:edu books recommendations list top-n confluence social-science methodology stats data-science causation endo-exo empirical experiment regression sociology polisci reading measurement endogenous-exogenous quixotic

september 2017 by nhaliday

Atrocity statistics from the Roman Era

september 2017 by nhaliday

Christian Martyrs [make link]

Gibbon, Decline & Fall v.2 ch.XVI: < 2,000 k. under Roman persecution.

Ludwig Hertling ("Die Zahl de Märtyrer bis 313", 1944) estimated 100,000 Christians killed between 30 and 313 CE. (cited -- unfavorably -- by David Henige, Numbers From Nowhere, 1998)

Catholic Encyclopedia, "Martyr": number of Christian martyrs under the Romans unknown, unknowable. Origen says not many. Eusebius says thousands.

...

General population decline during The Fall of Rome: 7,000,000 [make link]

- Colin McEvedy, The New Penguin Atlas of Medieval History (1992)

- From 2nd Century CE to 4th Century CE: Empire's population declined from 45M to 36M [i.e. 9M]

- From 400 CE to 600 CE: Empire's population declined by 20% [i.e. 7.2M]

- Paul Bairoch, Cities and economic development: from the dawn of history to the present, p.111

- "The population of Europe except Russia, then, having apparently reached a high point of some 40-55 million people by the start of the third century [ca.200 C.E.], seems to have fallen by the year 500 to about 30-40 million, bottoming out at about 20-35 million around 600." [i.e. ca.20M]

- Francois Crouzet, A History of the European Economy, 1000-2000 (University Press of Virginia: 2001) p.1.

- "The population of Europe (west of the Urals) in c. AD 200 has been estimated at 36 million; by 600, it had fallen to 26 million; another estimate (excluding ‘Russia’) gives a more drastic fall, from 44 to 22 million." [i.e. 10M or 22M]

also:

The geometric mean of these two extremes would come to 4½ per day, which is a credible daily rate for the really bad years.

why geometric mean? can you get it as the MLE given min{X1, ..., Xn} and max{X1, ..., Xn} for {X_i} iid Poissons? some kinda limit? think it might just be a rule of thumb.

yeah, it's a rule of thumb. found it it his book (epub).

org:junk
data
let-me-see
scale
history
iron-age
mediterranean
the-classics
death
nihil
conquest-empire
war
peace-violence
gibbon
trivia
multi
todo
AMT
expectancy
heuristic
stats
ML-MAP-E
data-science
estimate
magnitude
population
demographics
database
list
religion
christianity
leviathan
Gibbon, Decline & Fall v.2 ch.XVI: < 2,000 k. under Roman persecution.

Ludwig Hertling ("Die Zahl de Märtyrer bis 313", 1944) estimated 100,000 Christians killed between 30 and 313 CE. (cited -- unfavorably -- by David Henige, Numbers From Nowhere, 1998)

Catholic Encyclopedia, "Martyr": number of Christian martyrs under the Romans unknown, unknowable. Origen says not many. Eusebius says thousands.

...

General population decline during The Fall of Rome: 7,000,000 [make link]

- Colin McEvedy, The New Penguin Atlas of Medieval History (1992)

- From 2nd Century CE to 4th Century CE: Empire's population declined from 45M to 36M [i.e. 9M]

- From 400 CE to 600 CE: Empire's population declined by 20% [i.e. 7.2M]

- Paul Bairoch, Cities and economic development: from the dawn of history to the present, p.111

- "The population of Europe except Russia, then, having apparently reached a high point of some 40-55 million people by the start of the third century [ca.200 C.E.], seems to have fallen by the year 500 to about 30-40 million, bottoming out at about 20-35 million around 600." [i.e. ca.20M]

- Francois Crouzet, A History of the European Economy, 1000-2000 (University Press of Virginia: 2001) p.1.

- "The population of Europe (west of the Urals) in c. AD 200 has been estimated at 36 million; by 600, it had fallen to 26 million; another estimate (excluding ‘Russia’) gives a more drastic fall, from 44 to 22 million." [i.e. 10M or 22M]

also:

The geometric mean of these two extremes would come to 4½ per day, which is a credible daily rate for the really bad years.

why geometric mean? can you get it as the MLE given min{X1, ..., Xn} and max{X1, ..., Xn} for {X_i} iid Poissons? some kinda limit? think it might just be a rule of thumb.

yeah, it's a rule of thumb. found it it his book (epub).

september 2017 by nhaliday

All models are wrong - Wikipedia

august 2017 by nhaliday

Box repeated the aphorism in a paper that was published in the proceedings of a 1978 statistics workshop.[2] The paper contains a section entitled "All models are wrong but some are useful". The section is copied below.

Now it would be very remarkable if any system existing in the real world could be exactly represented by any simple model. However, cunningly chosen parsimonious models often do provide remarkably useful approximations. For example, the law PV = RT relating pressure P, volume V and temperature T of an "ideal" gas via a constant R is not exactly true for any real gas, but it frequently provides a useful approximation and furthermore its structure is informative since it springs from a physical view of the behavior of gas molecules.

For such a model there is no need to ask the question "Is the model true?". If "truth" is to be the "whole truth" the answer must be "No". The only question of interest is "Is the model illuminating and useful?".

thinking
metabuch
metameta
map-territory
models
accuracy
wire-guided
truth
philosophy
stats
data-science
methodology
lens
wiki
reference
complex-systems
occam
parsimony
science
nibble
hi-order-bits
info-dynamics
the-trenches
meta:science
physics
fluid
thermo
stat-mech
applicability-prereqs
theory-practice
elegance
simplification-normalization
Now it would be very remarkable if any system existing in the real world could be exactly represented by any simple model. However, cunningly chosen parsimonious models often do provide remarkably useful approximations. For example, the law PV = RT relating pressure P, volume V and temperature T of an "ideal" gas via a constant R is not exactly true for any real gas, but it frequently provides a useful approximation and furthermore its structure is informative since it springs from a physical view of the behavior of gas molecules.

For such a model there is no need to ask the question "Is the model true?". If "truth" is to be the "whole truth" the answer must be "No". The only question of interest is "Is the model illuminating and useful?".

august 2017 by nhaliday

Information Processing: Estimation of genetic architecture for complex traits using GWAS data

hsu scitariat commentary study summary bio preprint biodet behavioral-gen genetics genomics QTL scaling-up speedometer survey state-of-art iq education GWAS scale data visualization measurement 🌞 bioinformatics missing-heritability chart nibble population-genetics candidate-gene methodology stat-power bounded-cognition lens hypothesis-testing ioannidis stats meta:science

august 2017 by nhaliday

hsu scitariat commentary study summary bio preprint biodet behavioral-gen genetics genomics QTL scaling-up speedometer survey state-of-art iq education GWAS scale data visualization measurement 🌞 bioinformatics missing-heritability chart nibble population-genetics candidate-gene methodology stat-power bounded-cognition lens hypothesis-testing ioannidis stats meta:science

august 2017 by nhaliday

trees are harlequins, words are harlequins — bayes: a kinda-sorta masterpost

august 2017 by nhaliday

lol, gwern: https://www.reddit.com/r/slatestarcodex/comments/6ghsxf/biweekly_rational_feed/diqr0rq/

> What sort of person thinks “oh yeah, my beliefs about these coefficients correspond to a Gaussian with variance 2.5″? And what if I do cross-validation, like I always do, and find that variance 200 works better for the problem? Was the other person wrong? But how could they have known?

> ...Even ignoring the mode vs. mean issue, I have never met anyone who could tell whether their beliefs were normally distributed vs. Laplace distributed. Have you?

I must have spent too much time in Bayesland because both those strike me as very easy and I often think them! My beliefs usually are Laplace distributed when it comes to things like genetics (it makes me very sad to see GWASes with flat priors), and my Gaussian coefficients are actually a variance of 0.70 (assuming standardized variables w.l.o.g.) as is consistent with field-wide meta-analyses indicating that d>1 is pretty rare.

ratty
ssc
core-rats
tumblr
social
explanation
init
philosophy
bayesian
thinking
probability
stats
frequentist
big-yud
lesswrong
synchrony
similarity
critique
intricacy
shalizi
scitariat
selection
mutation
evolution
priors-posteriors
regularization
bias-variance
gwern
reddit
commentary
GWAS
genetics
regression
spock
nitty-gritty
generalization
epistemic
🤖
rationality
poast
multi
best-practices
methodology
data-science
> What sort of person thinks “oh yeah, my beliefs about these coefficients correspond to a Gaussian with variance 2.5″? And what if I do cross-validation, like I always do, and find that variance 200 works better for the problem? Was the other person wrong? But how could they have known?

> ...Even ignoring the mode vs. mean issue, I have never met anyone who could tell whether their beliefs were normally distributed vs. Laplace distributed. Have you?

I must have spent too much time in Bayesland because both those strike me as very easy and I often think them! My beliefs usually are Laplace distributed when it comes to things like genetics (it makes me very sad to see GWASes with flat priors), and my Gaussian coefficients are actually a variance of 0.70 (assuming standardized variables w.l.o.g.) as is consistent with field-wide meta-analyses indicating that d>1 is pretty rare.

august 2017 by nhaliday

Analysis of variance - Wikipedia

july 2017 by nhaliday

Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as "variation" among and between groups), developed by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. ANOVAs are useful for comparing (testing) three or more means (groups or variables) for statistical significance. It is conceptually similar to multiple two-sample t-tests, but is more conservative (results in less type I error) and is therefore suited to a wide range of practical problems.

good pic: https://en.wikipedia.org/wiki/Analysis_of_variance#Motivating_example

tutorial by Gelman: http://www.stat.columbia.edu/~gelman/research/published/econanova3.pdf

so one way to think of partitioning the variance:

y_ij = alpha_i + beta_j + eps_ij

Var(y_ij) = Var(alpha_i) + Var(beta_j) + Cov(alpha_i, beta_j) + Var(eps_ij)

and alpha_i, beta_j are independent, so Cov(alpha_i, beta_j) = 0

can you make this work w/ interaction effects?

data-science
stats
methodology
hypothesis-testing
variance-components
concept
conceptual-vocab
thinking
wiki
reference
nibble
multi
visualization
visual-understanding
pic
pdf
exposition
lecture-notes
gelman
scitariat
tutorial
acm
ground-up
yoga
good pic: https://en.wikipedia.org/wiki/Analysis_of_variance#Motivating_example

tutorial by Gelman: http://www.stat.columbia.edu/~gelman/research/published/econanova3.pdf

so one way to think of partitioning the variance:

y_ij = alpha_i + beta_j + eps_ij

Var(y_ij) = Var(alpha_i) + Var(beta_j) + Cov(alpha_i, beta_j) + Var(eps_ij)

and alpha_i, beta_j are independent, so Cov(alpha_i, beta_j) = 0

can you make this work w/ interaction effects?

july 2017 by nhaliday

Stat 260/CS 294: Bayesian Modeling and Inference

july 2017 by nhaliday

Topics

- Priors (conjugate, noninformative, reference)

- Hierarchical models, spatial models, longitudinal models, dynamic models, survival models

- Testing

- Model choice

- Inference (importance sampling, MCMC, sequential Monte Carlo)

- Nonparametric models (Dirichlet processes, Gaussian processes, neutral-to-the-right processes, completely random measures)

- Decision theory and frequentist perspectives (complete class theorems, consistency, empirical Bayes)

- Experimental design

unit
course
berkeley
expert
michael-jordan
machine-learning
acm
bayesian
probability
stats
lecture-notes
priors-posteriors
markov
monte-carlo
frequentist
latent-variables
decision-theory
expert-experience
confidence
sampling
- Priors (conjugate, noninformative, reference)

- Hierarchical models, spatial models, longitudinal models, dynamic models, survival models

- Testing

- Model choice

- Inference (importance sampling, MCMC, sequential Monte Carlo)

- Nonparametric models (Dirichlet processes, Gaussian processes, neutral-to-the-right processes, completely random measures)

- Decision theory and frequentist perspectives (complete class theorems, consistency, empirical Bayes)

- Experimental design

july 2017 by nhaliday

Econometric Modeling as Junk Science

june 2017 by nhaliday

The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics: https://www.aeaweb.org/articles?id=10.1257/jep.24.2.3

On data, experiments, incentives and highly unconvincing research – papers and hot beverages: https://papersandhotbeverages.wordpress.com/2015/10/31/on-data-experiments-incentives-and-highly-unconvincing-research/

In my view, it has just to do with the fact that academia is a peer monitored organization. In the case of (bad) data collection papers, issues related to measurement are typically boring. They are relegated to appendices, no one really has an incentive to monitor it seriously. The problem is similar in formal theory: no one really goes through the algebra in detail, but it is in principle feasible to do it, and, actually, sometimes these errors are detected. If discussing the algebra of a proof is almost unthinkable in a seminar, going into the details of data collection, measurement and aggregation is not only hard to imagine, but probably intrinsically infeasible.

Something different happens for the experimentalist people. As I was saying, I feel we have come to a point in which many papers are evaluated based on the cleverness and originality of the research design (“Using the World Cup qualifiers as an instrument for patriotism!? Woaw! how cool/crazy is that! I wish I had had that idea”). The sexiness of the identification strategy has too often become a goal in itself. When your peers monitor you paying more attention to the originality of the identification strategy than to the research question, you probably have an incentive to mine reality for ever crazier discontinuities. It is true methodologists have been criticized in the past for analogous reasons, such as being guided by the desire to increase mathematical complexity without a clear benefit. But, if you work with pure formal theory or statistical theory, your work is not meant to immediately answer question about the real world, but instead to serve other researchers in their quest. This is something that can, in general, not be said of applied CI work.

https://twitter.com/pseudoerasmus/status/662007951415238656

This post should have been entitled “Zombies who only think of their next cool IV fix”

https://twitter.com/pseudoerasmus/status/662692917069422592

massive lust for quasi-natural experiments, regression discontinuities

barely matters if the effects are not all that big

I suppose even the best of things must reach their decadent phase; methodological innov. to manias……

https://twitter.com/cblatts/status/920988530788130816

Following this "collapse of small-N social psych results" business, where do I predict econ will collapse? I see two main contenders.

One is lab studies. I dallied with these a few years ago in a Kenya lab. We ran several pilots of N=200 to figure out the best way to treat

and to measure the outcome. Every pilot gave us a different stat sig result. I could have written six papers concluding different things.

I gave up more skeptical of these lab studies than ever before. The second contender is the long run impacts literature in economic history

We should be very suspicious since we never see a paper showing that a historical event had no effect on modern day institutions or dvpt.

On the one hand I find these studies fun, fascinating, and probably true in a broad sense. They usually reinforce a widely believed history

argument with interesting data and a cute empirical strategy. But I don't think anyone believes the standard errors. There's probably a HUGE

problem of nonsignificant results staying in the file drawer. Also, there are probably data problems that don't get revealed, as we see with

the recent Piketty paper (http://marginalrevolution.com/marginalrevolution/2017/10/pikettys-data-reliable.html). So I take that literature with a vat of salt, even if I enjoy and admire the works

I used to think field experiments would show little consistency in results across place. That external validity concerns would be fatal.

In fact the results across different samples and places have proven surprisingly similar across places, and added a lot to general theory

Last, I've come to believe there is no such thing as a useful instrumental variable. The ones that actually meet the exclusion restriction

are so weird & particular that the local treatment effect is likely far different from the average treatment effect in non-transparent ways.

Most of the other IVs don't plausibly meet the e clue ion restriction. I mean, we should be concerned when the IV estimate is always 10x

larger than the OLS coefficient. This I find myself much more persuaded by simple natural experiments that use OLS, diff in diff, or

discontinuities, alongside randomized trials.

What do others think are the cliffs in economics?

PS All of these apply to political science too. Though I have a special extra target in poli sci: survey experiments! A few are good. I like

Dan Corstange's work. But it feels like 60% of dissertations these days are experiments buried in a survey instrument that measure small

changes in response. These at least have large N. But these are just uncontrolled labs, with negligible external validity in my mind.

The good ones are good. This method has its uses. But it's being way over-applied. More people have to make big and risky investments in big

natural and field experiments. Time to raise expectations and ambitions. This expectation bar, not technical ability, is the big advantage

economists have over political scientists when they compete in the same space.

(Ok. So are there any friends and colleagues I haven't insulted this morning? Let me know and I'll try my best to fix it with a screed)

HOW MUCH SHOULD WE TRUST DIFFERENCES-IN-DIFFERENCES ESTIMATES?∗: https://economics.mit.edu/files/750

Most papers that employ Differences-in-Differences estimation (DD) use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent. To illustrate the severity of this issue, we randomly generate placebo laws in state-level data on female wages from the Current Population Survey. For each law, we use OLS to compute the DD estimate of its “effect” as well as the standard error of this estimate. These conventional DD standard errors severely understate the standard deviation of the estimators: we find an “effect” significant at the 5 percent level for up to 45 percent of the placebo interventions. We use Monte Carlo simulations to investigate how well existing methods help solve this problem. Econometric corrections that place a specific parametric form on the time-series process do not perform well. Bootstrap (taking into account the auto-correlation of the data) works well when the number of states is large enough. Two corrections based on asymptotic approximation of the variance-covariance matrix work well for moderate numbers of states and one correction that collapses the time series information into a “pre” and “post” period and explicitly takes into account the effective sample size works well even for small numbers of states.

‘METRICS MONDAY: 2SLS–CHRONICLE OF A DEATH FORETOLD: http://marcfbellemare.com/wordpress/12733

As it turns out, Young finds that

1. Conventional tests tend to overreject the null hypothesis that the 2SLS coefficient is equal to zero.

2. 2SLS estimates are falsely declared significant one third to one half of the time, depending on the method used for bootstrapping.

3. The 99-percent confidence intervals (CIs) of those 2SLS estimates include the OLS point estimate over 90 of the time. They include the full OLS 99-percent CI over 75 percent of the time.

4. 2SLS estimates are extremely sensitive to outliers. Removing simply one outlying cluster or observation, almost half of 2SLS results become insignificant. Things get worse when removing two outlying clusters or observations, as over 60 percent of 2SLS results then become insignificant.

5. Using a Durbin-Wu-Hausman test, less than 15 percent of regressions can reject the null that OLS estimates are unbiased at the 1-percent level.

6. 2SLS has considerably higher mean squared error than OLS.

7. In one third to one half of published results, the null that the IVs are totally irrelevant cannot be rejected, and so the correlation between the endogenous variable(s) and the IVs is due to finite sample correlation between them.

8. Finally, fewer than 10 percent of 2SLS estimates reject instrument irrelevance and the absence of OLS bias at the 1-percent level using a Durbin-Wu-Hausman test. It gets much worse–fewer than 5 percent–if you add in the requirement that the 2SLS CI that excludes the OLS estimate.

Methods Matter: P-Hacking and Causal Inference in Economics*: http://ftp.iza.org/dp11796.pdf

Applying multiple methods to 13,440 hypothesis tests reported in 25 top economics journals in 2015, we show that selective publication and p-hacking is a substantial problem in research employing DID and (in particular) IV. RCT and RDD are much less problematic. Almost 25% of claims of marginally significant results in IV papers are misleading.

https://twitter.com/NoamJStein/status/1040887307568664577

Ever since I learned social science is completely fake, I've had a lot more time to do stuff that matters, like deadlifting and reading about Mediterranean haplogroups

--

Wait, so, from fakest to realest IV>DD>RCT>RDD? That totally matches my impression.

https://twitter.com/wwwojtekk/status/1190731344336293889

https://archive.is/EZu0h

Great (not completely new but still good to have it in one place) discussion of RCTs and inference in economics by Deaton, my favorite sentences (more general than just about RCT) below

Randomization in the tropics revisited: a theme and eleven variations: https://scholar.princeton.edu/sites/default/files/deaton/files/deaton_randomization_revisited_v3_2019.pdf

org:junk
org:edu
economics
econometrics
methodology
realness
truth
science
social-science
accuracy
generalization
essay
article
hmm
multi
study
🎩
empirical
causation
error
critique
sociology
criminology
hypothesis-testing
econotariat
broad-econ
cliometrics
endo-exo
replication
incentives
academia
measurement
wire-guided
intricacy
twitter
social
discussion
pseudoE
effect-size
reflection
field-study
stat-power
piketty
marginal-rev
commentary
data-science
expert-experience
regression
gotchas
rant
map-territory
pdf
simulation
moments
confidence
bias-variance
stats
endogenous-exogenous
control
meta:science
meta-analysis
outliers
summary
sampling
ensembles
monte-carlo
theory-practice
applicability-prereqs
chart
comparison
shift
ratty
unaffiliated
garett-jones
On data, experiments, incentives and highly unconvincing research – papers and hot beverages: https://papersandhotbeverages.wordpress.com/2015/10/31/on-data-experiments-incentives-and-highly-unconvincing-research/

In my view, it has just to do with the fact that academia is a peer monitored organization. In the case of (bad) data collection papers, issues related to measurement are typically boring. They are relegated to appendices, no one really has an incentive to monitor it seriously. The problem is similar in formal theory: no one really goes through the algebra in detail, but it is in principle feasible to do it, and, actually, sometimes these errors are detected. If discussing the algebra of a proof is almost unthinkable in a seminar, going into the details of data collection, measurement and aggregation is not only hard to imagine, but probably intrinsically infeasible.

Something different happens for the experimentalist people. As I was saying, I feel we have come to a point in which many papers are evaluated based on the cleverness and originality of the research design (“Using the World Cup qualifiers as an instrument for patriotism!? Woaw! how cool/crazy is that! I wish I had had that idea”). The sexiness of the identification strategy has too often become a goal in itself. When your peers monitor you paying more attention to the originality of the identification strategy than to the research question, you probably have an incentive to mine reality for ever crazier discontinuities. It is true methodologists have been criticized in the past for analogous reasons, such as being guided by the desire to increase mathematical complexity without a clear benefit. But, if you work with pure formal theory or statistical theory, your work is not meant to immediately answer question about the real world, but instead to serve other researchers in their quest. This is something that can, in general, not be said of applied CI work.

https://twitter.com/pseudoerasmus/status/662007951415238656

This post should have been entitled “Zombies who only think of their next cool IV fix”

https://twitter.com/pseudoerasmus/status/662692917069422592

massive lust for quasi-natural experiments, regression discontinuities

barely matters if the effects are not all that big

I suppose even the best of things must reach their decadent phase; methodological innov. to manias……

https://twitter.com/cblatts/status/920988530788130816

Following this "collapse of small-N social psych results" business, where do I predict econ will collapse? I see two main contenders.

One is lab studies. I dallied with these a few years ago in a Kenya lab. We ran several pilots of N=200 to figure out the best way to treat

and to measure the outcome. Every pilot gave us a different stat sig result. I could have written six papers concluding different things.

I gave up more skeptical of these lab studies than ever before. The second contender is the long run impacts literature in economic history

We should be very suspicious since we never see a paper showing that a historical event had no effect on modern day institutions or dvpt.

On the one hand I find these studies fun, fascinating, and probably true in a broad sense. They usually reinforce a widely believed history

argument with interesting data and a cute empirical strategy. But I don't think anyone believes the standard errors. There's probably a HUGE

problem of nonsignificant results staying in the file drawer. Also, there are probably data problems that don't get revealed, as we see with

the recent Piketty paper (http://marginalrevolution.com/marginalrevolution/2017/10/pikettys-data-reliable.html). So I take that literature with a vat of salt, even if I enjoy and admire the works

I used to think field experiments would show little consistency in results across place. That external validity concerns would be fatal.

In fact the results across different samples and places have proven surprisingly similar across places, and added a lot to general theory

Last, I've come to believe there is no such thing as a useful instrumental variable. The ones that actually meet the exclusion restriction

are so weird & particular that the local treatment effect is likely far different from the average treatment effect in non-transparent ways.

Most of the other IVs don't plausibly meet the e clue ion restriction. I mean, we should be concerned when the IV estimate is always 10x

larger than the OLS coefficient. This I find myself much more persuaded by simple natural experiments that use OLS, diff in diff, or

discontinuities, alongside randomized trials.

What do others think are the cliffs in economics?

PS All of these apply to political science too. Though I have a special extra target in poli sci: survey experiments! A few are good. I like

Dan Corstange's work. But it feels like 60% of dissertations these days are experiments buried in a survey instrument that measure small

changes in response. These at least have large N. But these are just uncontrolled labs, with negligible external validity in my mind.

The good ones are good. This method has its uses. But it's being way over-applied. More people have to make big and risky investments in big

natural and field experiments. Time to raise expectations and ambitions. This expectation bar, not technical ability, is the big advantage

economists have over political scientists when they compete in the same space.

(Ok. So are there any friends and colleagues I haven't insulted this morning? Let me know and I'll try my best to fix it with a screed)

HOW MUCH SHOULD WE TRUST DIFFERENCES-IN-DIFFERENCES ESTIMATES?∗: https://economics.mit.edu/files/750

Most papers that employ Differences-in-Differences estimation (DD) use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent. To illustrate the severity of this issue, we randomly generate placebo laws in state-level data on female wages from the Current Population Survey. For each law, we use OLS to compute the DD estimate of its “effect” as well as the standard error of this estimate. These conventional DD standard errors severely understate the standard deviation of the estimators: we find an “effect” significant at the 5 percent level for up to 45 percent of the placebo interventions. We use Monte Carlo simulations to investigate how well existing methods help solve this problem. Econometric corrections that place a specific parametric form on the time-series process do not perform well. Bootstrap (taking into account the auto-correlation of the data) works well when the number of states is large enough. Two corrections based on asymptotic approximation of the variance-covariance matrix work well for moderate numbers of states and one correction that collapses the time series information into a “pre” and “post” period and explicitly takes into account the effective sample size works well even for small numbers of states.

‘METRICS MONDAY: 2SLS–CHRONICLE OF A DEATH FORETOLD: http://marcfbellemare.com/wordpress/12733

As it turns out, Young finds that

1. Conventional tests tend to overreject the null hypothesis that the 2SLS coefficient is equal to zero.

2. 2SLS estimates are falsely declared significant one third to one half of the time, depending on the method used for bootstrapping.

3. The 99-percent confidence intervals (CIs) of those 2SLS estimates include the OLS point estimate over 90 of the time. They include the full OLS 99-percent CI over 75 percent of the time.

4. 2SLS estimates are extremely sensitive to outliers. Removing simply one outlying cluster or observation, almost half of 2SLS results become insignificant. Things get worse when removing two outlying clusters or observations, as over 60 percent of 2SLS results then become insignificant.

5. Using a Durbin-Wu-Hausman test, less than 15 percent of regressions can reject the null that OLS estimates are unbiased at the 1-percent level.

6. 2SLS has considerably higher mean squared error than OLS.

7. In one third to one half of published results, the null that the IVs are totally irrelevant cannot be rejected, and so the correlation between the endogenous variable(s) and the IVs is due to finite sample correlation between them.

8. Finally, fewer than 10 percent of 2SLS estimates reject instrument irrelevance and the absence of OLS bias at the 1-percent level using a Durbin-Wu-Hausman test. It gets much worse–fewer than 5 percent–if you add in the requirement that the 2SLS CI that excludes the OLS estimate.

Methods Matter: P-Hacking and Causal Inference in Economics*: http://ftp.iza.org/dp11796.pdf

Applying multiple methods to 13,440 hypothesis tests reported in 25 top economics journals in 2015, we show that selective publication and p-hacking is a substantial problem in research employing DID and (in particular) IV. RCT and RDD are much less problematic. Almost 25% of claims of marginally significant results in IV papers are misleading.

https://twitter.com/NoamJStein/status/1040887307568664577

Ever since I learned social science is completely fake, I've had a lot more time to do stuff that matters, like deadlifting and reading about Mediterranean haplogroups

--

Wait, so, from fakest to realest IV>DD>RCT>RDD? That totally matches my impression.

https://twitter.com/wwwojtekk/status/1190731344336293889

https://archive.is/EZu0h

Great (not completely new but still good to have it in one place) discussion of RCTs and inference in economics by Deaton, my favorite sentences (more general than just about RCT) below

Randomization in the tropics revisited: a theme and eleven variations: https://scholar.princeton.edu/sites/default/files/deaton/files/deaton_randomization_revisited_v3_2019.pdf

june 2017 by nhaliday

Tables of regression coefficients - Statistical Modeling, Causal Inference, and Social Science

june 2017 by nhaliday

http://svmiller.com/blog/2014/08/reading-a-regression-table-a-guide-for-students/

http://egap.org/methods-guides/10-things-know-about-reading-regression-table

https://www.youtube.com/watch?v=o86xvmUYo-Q

http://psych.unl.edu/psycrs/statpage/full_eg.pdf

typical table in econometrics: http://imgur.com/a/RTqfJ

- different columns summarize robustness checks w/ different sets of controls

- parens = standard errors

- variance explained+number observations at bottom

- dependent variable at absolute top, rows = explanatory variables

gelman
scitariat
stats
methodology
data-science
econometrics
info-foraging
howto
multi
regression
dataviz
explanation
article
tutorial
video
pdf
pic
objektbuch
cheatsheet
checklists
http://egap.org/methods-guides/10-things-know-about-reading-regression-table

https://www.youtube.com/watch?v=o86xvmUYo-Q

http://psych.unl.edu/psycrs/statpage/full_eg.pdf

typical table in econometrics: http://imgur.com/a/RTqfJ

- different columns summarize robustness checks w/ different sets of controls

- parens = standard errors

- variance explained+number observations at bottom

- dependent variable at absolute top, rows = explanatory variables

june 2017 by nhaliday

9 Multivariate linear models for GWAS

pdf nibble article lecture-notes exposition bio biodet genetics genomics bioinformatics GWAS methodology explanation regression regularization machine-learning acm stats stanford 🌞 spearhead GCTA sparsity compressed-sensing linear-models concept levers ideas population-genetics

may 2017 by nhaliday

pdf nibble article lecture-notes exposition bio biodet genetics genomics bioinformatics GWAS methodology explanation regression regularization machine-learning acm stats stanford 🌞 spearhead GCTA sparsity compressed-sensing linear-models concept levers ideas population-genetics

may 2017 by nhaliday

Pearson correlation coefficient - Wikipedia

may 2017 by nhaliday

https://en.wikipedia.org/wiki/Coefficient_of_determination

what does this mean?: https://twitter.com/GarettJones/status/863546692724858880

deleted but it was about the Pearson correlation distance: 1-r

I guess it's a metric

https://en.wikipedia.org/wiki/Explained_variation

http://infoproc.blogspot.com/2014/02/correlation-and-variance.html

A less misleading way to think about the correlation R is as follows: given X,Y from a standardized bivariate distribution with correlation R, an increase in X leads to an expected increase in Y: dY = R dX. In other words, students with +1 SD SAT score have, on average, roughly +0.4 SD college GPAs. Similarly, students with +1 SD college GPAs have on average +0.4 SAT.

this reminds me of the breeder's equation (but it uses r instead of h^2, so it can't actually be the same)

https://www.reddit.com/r/slatestarcodex/comments/631haf/on_the_commentariat_here_and_why_i_dont_think_i/dfx4e2s/

stats
science
hypothesis-testing
correlation
metrics
plots
regression
wiki
reference
nibble
methodology
multi
twitter
social
discussion
best-practices
econotariat
garett-jones
concept
conceptual-vocab
accuracy
causation
acm
matrix-factorization
todo
explanation
yoga
hsu
street-fighting
levers
🌞
2014
scitariat
variance-components
meta:prediction
biodet
s:**
mental-math
reddit
commentary
ssc
poast
gwern
data-science
metric-space
similarity
measure
dependence-independence
what does this mean?: https://twitter.com/GarettJones/status/863546692724858880

deleted but it was about the Pearson correlation distance: 1-r

I guess it's a metric

https://en.wikipedia.org/wiki/Explained_variation

http://infoproc.blogspot.com/2014/02/correlation-and-variance.html

A less misleading way to think about the correlation R is as follows: given X,Y from a standardized bivariate distribution with correlation R, an increase in X leads to an expected increase in Y: dY = R dX. In other words, students with +1 SD SAT score have, on average, roughly +0.4 SD college GPAs. Similarly, students with +1 SD college GPAs have on average +0.4 SAT.

this reminds me of the breeder's equation (but it uses r instead of h^2, so it can't actually be the same)

https://www.reddit.com/r/slatestarcodex/comments/631haf/on_the_commentariat_here_and_why_i_dont_think_i/dfx4e2s/

may 2017 by nhaliday

Deming regression - Wikipedia

may 2017 by nhaliday

Deming regression. The red lines show the error in both x and y. This is different from the traditional least squares method which measures error parallel to the y axis. The case shown, with deviations measured perpendicularly, arises when errors in x and y have equal variances.

https://en.wikipedia.org/wiki/Errors-in-variables_models

stats
data-science
regression
methodology
direction
noise-structure
wiki
reference
nibble
multi
https://en.wikipedia.org/wiki/Errors-in-variables_models

may 2017 by nhaliday

Outline of academic disciplines - Wikipedia

may 2017 by nhaliday

Outline of philosophy: https://en.wikipedia.org/wiki/Outline_of_philosophy

Figurative system of human knowledge: https://en.wikipedia.org/wiki/Figurative_system_of_human_knowledge

Branches of science: https://en.wikipedia.org/wiki/Branches_of_science

Outline of mathematics: https://en.wikipedia.org/wiki/Outline_of_mathematics

Outline of physics: https://en.wikipedia.org/wiki/Outline_of_physics

Branches of physics: https://en.wikipedia.org/wiki/Branches_of_physics

Outline of biology: https://en.wikipedia.org/wiki/Outline_of_biology

nibble
skeleton
accretion
links
wiki
reference
physics
mechanics
electromag
relativity
quantum
trees
synthesis
hi-order-bits
conceptual-vocab
summary
big-picture
lens
🔬
encyclopedic
chart
multi
knowledge
philosophy
theos
ideology
science
academia
religion
christianity
reason
epistemic
bio
nature
engineering
dirty-hands
art
poetry
math
ethics
morality
metameta
objektbuch
law
retention
logic
inference
thinking
technology
social-science
cs
theory-practice
detail-architecture
stats
apollonian-dionysian
letters
quixotic
Figurative system of human knowledge: https://en.wikipedia.org/wiki/Figurative_system_of_human_knowledge

Branches of science: https://en.wikipedia.org/wiki/Branches_of_science

Outline of mathematics: https://en.wikipedia.org/wiki/Outline_of_mathematics

Outline of physics: https://en.wikipedia.org/wiki/Outline_of_physics

Branches of physics: https://en.wikipedia.org/wiki/Branches_of_physics

Outline of biology: https://en.wikipedia.org/wiki/Outline_of_biology

may 2017 by nhaliday

[1502.05274] How predictable is technological progress?

april 2017 by nhaliday

Recently it has become clear that many technologies follow a generalized version of Moore's law, i.e. costs tend to drop exponentially, at different rates that depend on the technology. Here we formulate Moore's law as a correlated geometric random walk with drift, and apply it to historical data on 53 technologies. We derive a closed form expression approximating the distribution of forecast errors as a function of time. Based on hind-casting experiments we show that this works well, making it possible to collapse the forecast errors for many different technologies at different time horizons onto the same universal distribution. This is valuable because it allows us to make forecasts for any given technology with a clear understanding of the quality of the forecasts. As a practical demonstration we make distributional forecasts at different time horizons for solar photovoltaic modules, and show how our method can be used to estimate the probability that a given technology will outperform another technology at a given point in the future.

model:

- p_t = unit price of tech

- log(p_t) = y_0 - μt + ∑_{i <= t} n_i

- n_t iid noise process

preprint
study
economics
growth-econ
innovation
discovery
technology
frontier
tetlock
meta:prediction
models
time
definite-planning
stylized-facts
regression
econometrics
magnitude
energy-resources
phys-energy
money
cost-benefit
stats
data-science
🔬
ideas
speedometer
multiplicative
methodology
stochastic-processes
time-series
stock-flow
iteration-recursion
org:mat
street-fighting
the-bones
whiggish-hegelian
pessimism
eden-heaven
model:

- p_t = unit price of tech

- log(p_t) = y_0 - μt + ∑_{i <= t} n_i

- n_t iid noise process

april 2017 by nhaliday

'Capital in the Twenty-First Century' by Thomas Piketty, reviewed | New Republic

april 2017 by nhaliday

by Robert Solow (positive)

The data then exhibit a clear pattern. In France and Great Britain, national capital stood fairly steadily at about seven times national income from 1700 to 1910, then fell sharply from 1910 to 1950, presumably as a result of wars and depression, reaching a low of 2.5 in Britain and a bit less than 3 in France. The capital-income ratio then began to climb in both countries, and reached slightly more than 5 in Britain and slightly less than 6 in France by 2010. The trajectory in the United States was slightly different: it started at just above 3 in 1770, climbed to 5 in 1910, fell slightly in 1920, recovered to a high between 5 and 5.5 in 1930, fell to below 4 in 1950, and was back to 4.5 in 2010.

The wealth-income ratio in the United States has always been lower than in Europe. The main reason in the early years was that land values bulked less in the wide open spaces of North America. There was of course much more land, but it was very cheap. Into the twentieth century and onward, however, the lower capital-income ratio in the United States probably reflects the higher level of productivity: a given amount of capital could support a larger production of output than in Europe. It is no surprise that the two world wars caused much less destruction and dissipation of capital in the United States than in Britain and France. The important observation for Piketty’s argument is that, in all three countries, and elsewhere as well, the wealth-income ratio has been increasing since 1950, and is almost back to nineteenth-century levels. He projects this increase to continue into the current century, with weighty consequences that will be discussed as we go on.

...

Now if you multiply the rate of return on capital by the capital-income ratio, you get the share of capital in the national income. For example, if the rate of return is 5 percent a year and the stock of capital is six years worth of national income, income from capital will be 30 percent of national income, and so income from work will be the remaining 70 percent. At last, after all this preparation, we are beginning to talk about inequality, and in two distinct senses. First, we have arrived at the functional distribution of income—the split between income from work and income from wealth. Second, it is always the case that wealth is more highly concentrated among the rich than income from labor (although recent American history looks rather odd in this respect); and this being so, the larger the share of income from wealth, the more unequal the distribution of income among persons is likely to be. It is this inequality across persons that matters most for good or ill in a society.

...

The data are complicated and not easily comparable across time and space, but here is the flavor of Piketty’s summary picture. Capital is indeed very unequally distributed. Currently in the United States, the top 10 percent own about 70 percent of all the capital, half of that belonging to the top 1 percent; the next 40 percent—who compose the “middle class”—own about a quarter of the total (much of that in the form of housing), and the remaining half of the population owns next to nothing, about 5 percent of total wealth. Even that amount of middle-class property ownership is a new phenomenon in history. The typical European country is a little more egalitarian: the top 1 percent own 25 percent of the total capital, and the middle class 35 percent. (A century ago the European middle class owned essentially no wealth at all.) If the ownership of wealth in fact becomes even more concentrated during the rest of the twenty-first century, the outlook is pretty bleak unless you have a taste for oligarchy.

Income from wealth is probably even more concentrated than wealth itself because, as Piketty notes, large blocks of wealth tend to earn a higher return than small ones. Some of this advantage comes from economies of scale, but more may come from the fact that very big investors have access to a wider range of investment opportunities than smaller investors. Income from work is naturally less concentrated than income from wealth. In Piketty’s stylized picture of the United States today, the top 1 percent earns about 12 percent of all labor income, the next 9 percent earn 23 percent, the middle class gets about 40 percent, and the bottom half about a quarter of income from work. Europe is not very different: the top 10 percent collect somewhat less and the other two groups a little more.

You get the picture: modern capitalism is an unequal society, and the rich-get-richer dynamic strongly suggest that it will get more so. But there is one more loose end to tie up, already hinted at, and it has to do with the advent of very high wage incomes. First, here are some facts about the composition of top incomes. About 60 percent of the income of the top 1 percent in the United States today is labor income. Only when you get to the top tenth of 1 percent does income from capital start to predominate. The income of the top hundredth of 1 percent is 70 percent from capital. The story for France is not very different, though the proportion of labor income is a bit higher at every level. Evidently there are some very high wage incomes, as if you didn’t know.

This is a fairly recent development. In the 1960s, the top 1 percent of wage earners collected a little more than 5 percent of all wage incomes. This fraction has risen pretty steadily until nowadays, when the top 1 percent of wage earners receive 10–12 percent of all wages. This time the story is rather different in France. There the share of total wages going to the top percentile was steady at 6 percent until very recently, when it climbed to 7 percent. The recent surge of extreme inequality at the top of the wage distribution may be primarily an American development. Piketty, who with Emmanuel Saez has made a careful study of high-income tax returns in the United States, attributes this to the rise of what he calls “supermanagers.” The very highest income class consists to a substantial extent of top executives of large corporations, with very rich compensation packages. (A disproportionate number of these, but by no means all of them, come from the financial services industry.) With or without stock options, these large pay packages get converted to wealth and future income from wealth. But the fact remains that much of the increased income (and wealth) inequality in the United States is driven by the rise of these supermanagers.

and Deirdre McCloskey (p critical): https://ejpe.org/journal/article/view/170

nice discussion of empirical economics, economic history, market failures and statism, etc., with several bon mots

Piketty’s great splash will undoubtedly bring many young economically interested scholars to devote their lives to the study of the past. That is good, because economic history is one of the few scientifically quantitative branches of economics. In economic history, as in experimental economics and a few other fields, the economists confront the evidence (as they do not for example in most macroeconomics or industrial organization or international trade theory nowadays).

...

Piketty gives a fine example of how to do it. He does not get entangled as so many economists do in the sole empirical tool they are taught, namely, regression analysis on someone else’s “data” (one of the problems is the word data, meaning “things given”: scientists should deal in capta, “things seized”). Therefore he does not commit one of the two sins of modern economics, the use of meaningless “tests” of statistical significance (he occasionally refers to “statistically insignificant” relations between, say, tax rates and growth rates, but I am hoping he does not suppose that a large coefficient is “insignificant” because R. A. Fisher in 1925 said it was). Piketty constructs or uses statistics of aggregate capital and of inequality and then plots them out for inspection, which is what physicists, for example, also do in dealing with their experiments and observations. Nor does he commit the other sin, which is to waste scientific time on existence theorems. Physicists, again, don’t. If we economists are going to persist in physics envy let us at least learn what physicists actually do. Piketty stays close to the facts, and does not, for example, wander into the pointless worlds of non-cooperative game theory, long demolished by experimental economics. He also does not have recourse to non-computable general equilibrium, which never was of use for quantitative economic science, being a branch of philosophy, and a futile one at that. On both points, bravissimo.

...

Since those founding geniuses of classical economics, a market-tested betterment (a locution to be preferred to “capitalism”, with its erroneous implication that capital accumulation, not innovation, is what made us better off) has enormously enriched large parts of a humanity now seven times larger in population than in 1800, and bids fair in the next fifty years or so to enrich everyone on the planet. [Not SSA or MENA...]

...

Then economists, many on the left but some on the right, in quick succession from 1880 to the present—at the same time that market-tested betterment was driving real wages up and up and up—commenced worrying about, to name a few of the pessimisms concerning “capitalism” they discerned: greed, alienation, racial impurity, workers’ lack of bargaining strength, workers’ bad taste in consumption, immigration of lesser breeds, monopoly, unemployment, business cycles, increasing returns, externalities, under-consumption, monopolistic competition, separation of ownership from control, lack of planning, post-War stagnation, investment spillovers, unbalanced growth, dual labor markets, capital insufficiency (William Easterly calls it “capital fundamentalism”), peasant irrationality, capital-market imperfections, public … [more]

news
org:mag
big-peeps
econotariat
economics
books
review
capital
capitalism
inequality
winner-take-all
piketty
wealth
class
labor
mobility
redistribution
growth-econ
rent-seeking
history
mostly-modern
trends
compensation
article
malaise
🎩
the-bones
whiggish-hegelian
cjones-like
multi
mokyr-allen-mccloskey
expert
market-failure
government
broad-econ
cliometrics
aphorism
lens
gallic
clarity
europe
critique
rant
optimism
regularizer
pessimism
ideology
behavioral-econ
authoritarianism
intervention
polanyi-marx
politics
left-wing
absolute-relative
regression-to-mean
legacy
empirical
data-science
econometrics
methodology
hypothesis-testing
physics
iron-age
mediterranean
the-classics
quotes
krugman
world
entrepreneurialism
human-capital
education
supply-demand
plots
manifolds
intersection
markets
evolution
darwinian
giants
old-anglo
egalitarianism-hierarchy
optimate
morality
ethics
envy
stagnation
nl-and-so-can-you
expert-experience
courage
stats
randy-ayndy
reason
intersection-connectedness
detail-architect
The data then exhibit a clear pattern. In France and Great Britain, national capital stood fairly steadily at about seven times national income from 1700 to 1910, then fell sharply from 1910 to 1950, presumably as a result of wars and depression, reaching a low of 2.5 in Britain and a bit less than 3 in France. The capital-income ratio then began to climb in both countries, and reached slightly more than 5 in Britain and slightly less than 6 in France by 2010. The trajectory in the United States was slightly different: it started at just above 3 in 1770, climbed to 5 in 1910, fell slightly in 1920, recovered to a high between 5 and 5.5 in 1930, fell to below 4 in 1950, and was back to 4.5 in 2010.

The wealth-income ratio in the United States has always been lower than in Europe. The main reason in the early years was that land values bulked less in the wide open spaces of North America. There was of course much more land, but it was very cheap. Into the twentieth century and onward, however, the lower capital-income ratio in the United States probably reflects the higher level of productivity: a given amount of capital could support a larger production of output than in Europe. It is no surprise that the two world wars caused much less destruction and dissipation of capital in the United States than in Britain and France. The important observation for Piketty’s argument is that, in all three countries, and elsewhere as well, the wealth-income ratio has been increasing since 1950, and is almost back to nineteenth-century levels. He projects this increase to continue into the current century, with weighty consequences that will be discussed as we go on.

...

Now if you multiply the rate of return on capital by the capital-income ratio, you get the share of capital in the national income. For example, if the rate of return is 5 percent a year and the stock of capital is six years worth of national income, income from capital will be 30 percent of national income, and so income from work will be the remaining 70 percent. At last, after all this preparation, we are beginning to talk about inequality, and in two distinct senses. First, we have arrived at the functional distribution of income—the split between income from work and income from wealth. Second, it is always the case that wealth is more highly concentrated among the rich than income from labor (although recent American history looks rather odd in this respect); and this being so, the larger the share of income from wealth, the more unequal the distribution of income among persons is likely to be. It is this inequality across persons that matters most for good or ill in a society.

...

The data are complicated and not easily comparable across time and space, but here is the flavor of Piketty’s summary picture. Capital is indeed very unequally distributed. Currently in the United States, the top 10 percent own about 70 percent of all the capital, half of that belonging to the top 1 percent; the next 40 percent—who compose the “middle class”—own about a quarter of the total (much of that in the form of housing), and the remaining half of the population owns next to nothing, about 5 percent of total wealth. Even that amount of middle-class property ownership is a new phenomenon in history. The typical European country is a little more egalitarian: the top 1 percent own 25 percent of the total capital, and the middle class 35 percent. (A century ago the European middle class owned essentially no wealth at all.) If the ownership of wealth in fact becomes even more concentrated during the rest of the twenty-first century, the outlook is pretty bleak unless you have a taste for oligarchy.

Income from wealth is probably even more concentrated than wealth itself because, as Piketty notes, large blocks of wealth tend to earn a higher return than small ones. Some of this advantage comes from economies of scale, but more may come from the fact that very big investors have access to a wider range of investment opportunities than smaller investors. Income from work is naturally less concentrated than income from wealth. In Piketty’s stylized picture of the United States today, the top 1 percent earns about 12 percent of all labor income, the next 9 percent earn 23 percent, the middle class gets about 40 percent, and the bottom half about a quarter of income from work. Europe is not very different: the top 10 percent collect somewhat less and the other two groups a little more.

You get the picture: modern capitalism is an unequal society, and the rich-get-richer dynamic strongly suggest that it will get more so. But there is one more loose end to tie up, already hinted at, and it has to do with the advent of very high wage incomes. First, here are some facts about the composition of top incomes. About 60 percent of the income of the top 1 percent in the United States today is labor income. Only when you get to the top tenth of 1 percent does income from capital start to predominate. The income of the top hundredth of 1 percent is 70 percent from capital. The story for France is not very different, though the proportion of labor income is a bit higher at every level. Evidently there are some very high wage incomes, as if you didn’t know.

This is a fairly recent development. In the 1960s, the top 1 percent of wage earners collected a little more than 5 percent of all wage incomes. This fraction has risen pretty steadily until nowadays, when the top 1 percent of wage earners receive 10–12 percent of all wages. This time the story is rather different in France. There the share of total wages going to the top percentile was steady at 6 percent until very recently, when it climbed to 7 percent. The recent surge of extreme inequality at the top of the wage distribution may be primarily an American development. Piketty, who with Emmanuel Saez has made a careful study of high-income tax returns in the United States, attributes this to the rise of what he calls “supermanagers.” The very highest income class consists to a substantial extent of top executives of large corporations, with very rich compensation packages. (A disproportionate number of these, but by no means all of them, come from the financial services industry.) With or without stock options, these large pay packages get converted to wealth and future income from wealth. But the fact remains that much of the increased income (and wealth) inequality in the United States is driven by the rise of these supermanagers.

and Deirdre McCloskey (p critical): https://ejpe.org/journal/article/view/170

nice discussion of empirical economics, economic history, market failures and statism, etc., with several bon mots

Piketty’s great splash will undoubtedly bring many young economically interested scholars to devote their lives to the study of the past. That is good, because economic history is one of the few scientifically quantitative branches of economics. In economic history, as in experimental economics and a few other fields, the economists confront the evidence (as they do not for example in most macroeconomics or industrial organization or international trade theory nowadays).

...

Piketty gives a fine example of how to do it. He does not get entangled as so many economists do in the sole empirical tool they are taught, namely, regression analysis on someone else’s “data” (one of the problems is the word data, meaning “things given”: scientists should deal in capta, “things seized”). Therefore he does not commit one of the two sins of modern economics, the use of meaningless “tests” of statistical significance (he occasionally refers to “statistically insignificant” relations between, say, tax rates and growth rates, but I am hoping he does not suppose that a large coefficient is “insignificant” because R. A. Fisher in 1925 said it was). Piketty constructs or uses statistics of aggregate capital and of inequality and then plots them out for inspection, which is what physicists, for example, also do in dealing with their experiments and observations. Nor does he commit the other sin, which is to waste scientific time on existence theorems. Physicists, again, don’t. If we economists are going to persist in physics envy let us at least learn what physicists actually do. Piketty stays close to the facts, and does not, for example, wander into the pointless worlds of non-cooperative game theory, long demolished by experimental economics. He also does not have recourse to non-computable general equilibrium, which never was of use for quantitative economic science, being a branch of philosophy, and a futile one at that. On both points, bravissimo.

...

Since those founding geniuses of classical economics, a market-tested betterment (a locution to be preferred to “capitalism”, with its erroneous implication that capital accumulation, not innovation, is what made us better off) has enormously enriched large parts of a humanity now seven times larger in population than in 1800, and bids fair in the next fifty years or so to enrich everyone on the planet. [Not SSA or MENA...]

...

Then economists, many on the left but some on the right, in quick succession from 1880 to the present—at the same time that market-tested betterment was driving real wages up and up and up—commenced worrying about, to name a few of the pessimisms concerning “capitalism” they discerned: greed, alienation, racial impurity, workers’ lack of bargaining strength, workers’ bad taste in consumption, immigration of lesser breeds, monopoly, unemployment, business cycles, increasing returns, externalities, under-consumption, monopolistic competition, separation of ownership from control, lack of planning, post-War stagnation, investment spillovers, unbalanced growth, dual labor markets, capital insufficiency (William Easterly calls it “capital fundamentalism”), peasant irrationality, capital-market imperfections, public … [more]

april 2017 by nhaliday

Information Processing: Why does GCTA work?

hsu scitariat commentary links study bio preprint summary methodology biodet genetics genomics bioinformatics variance-components 🌞 population-genetics QTL missing-heritability scaling-up article GCTA spearhead nibble stats concept levers ideas

april 2017 by nhaliday

hsu scitariat commentary links study bio preprint summary methodology biodet genetics genomics bioinformatics variance-components 🌞 population-genetics QTL missing-heritability scaling-up article GCTA spearhead nibble stats concept levers ideas

april 2017 by nhaliday

Educational Romanticism & Economic Development | pseudoerasmus

april 2017 by nhaliday

https://twitter.com/GarettJones/status/852339296358940672

deleeted

https://twitter.com/GarettJones/status/943238170312929280

https://archive.is/p5hRA

Did Nations that Boosted Education Grow Faster?: http://econlog.econlib.org/archives/2012/10/did_nations_tha.html

On average, no relationship. The trendline points down slightly, but for the time being let's just call it a draw. It's a well-known fact that countries that started the 1960's with high education levels grew faster (example), but this graph is about something different. This graph shows that countries that increased their education levels did not grow faster.

Where has all the education gone?: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1016.2704&rep=rep1&type=pdf

https://twitter.com/GarettJones/status/948052794681966593

https://archive.is/kjxqp

https://twitter.com/GarettJones/status/950952412503822337

https://archive.is/3YPic

https://twitter.com/pseudoerasmus/status/862961420065001472

http://hanushek.stanford.edu/publications/schooling-educational-achievement-and-latin-american-growth-puzzle

The Case Against Education: What's Taking So Long, Bryan Caplan: http://econlog.econlib.org/archives/2015/03/the_case_agains_9.html

The World Might Be Better Off Without College for Everyone: https://www.theatlantic.com/magazine/archive/2018/01/whats-college-good-for/546590/

Students don't seem to be getting much out of higher education.

- Bryan Caplan

College: Capital or Signal?: http://www.economicmanblog.com/2017/02/25/college-capital-or-signal/

After his review of the literature, Caplan concludes that roughly 80% of the earnings effect from college comes from signalling, with only 20% the result of skill building. Put this together with his earlier observations about the private returns to college education, along with its exploding cost, and Caplan thinks that the social returns are negative. The policy implications of this will come as very bitter medicine for friends of Bernie Sanders.

Doubting the Null Hypothesis: http://www.arnoldkling.com/blog/doubting-the-null-hypothesis/

Is higher education/college in the US more about skill-building or about signaling?: https://www.quora.com/Is-higher-education-college-in-the-US-more-about-skill-building-or-about-signaling

ballpark: 50% signaling, 30% selection, 20% addition to human capital

more signaling in art history, more human capital in engineering, more selection in philosophy

Econ Duel! Is Education Signaling or Skill Building?: http://marginalrevolution.com/marginalrevolution/2016/03/econ-duel-is-education-signaling-or-skill-building.html

Marginal Revolution University has a brand new feature, Econ Duel! Our first Econ Duel features Tyler and me debating the question, Is education more about signaling or skill building?

Against Tulip Subsidies: https://slatestarcodex.com/2015/06/06/against-tulip-subsidies/

https://www.overcomingbias.com/2018/01/read-the-case-against-education.html

https://nintil.com/2018/02/05/notes-on-the-case-against-education/

https://www.nationalreview.com/magazine/2018-02-19-0000/bryan-caplan-case-against-education-review

https://spottedtoad.wordpress.com/2018/02/12/the-case-against-education/

Most American public school kids are low-income; about half are non-white; most are fairly low skilled academically. For most American kids, the majority of the waking hours they spend not engaged with electronic media are at school; the majority of their in-person relationships are at school; the most important relationships they have with an adult who is not their parent is with their teacher. For their parents, the most important in-person source of community is also their kids’ school. Young people need adult mirrors, models, mentors, and in an earlier era these might have been provided by extended families, but in our own era this all falls upon schools.

Caplan gestures towards work and earlier labor force participation as alternatives to school for many if not all kids. And I empathize: the years that I would point to as making me who I am were ones where I was working, not studying. But they were years spent working in schools, as a teacher or assistant. If schools did not exist, is there an alternative that we genuinely believe would arise to draw young people into the life of their community?

...

It is not an accident that the state that spends the least on education is Utah, where the LDS church can take up some of the slack for schools, while next door Wyoming spends almost the most of any state at $16,000 per student. Education is now the one surviving binding principle of the society as a whole, the one black box everyone will agree to, and so while you can press for less subsidization of education by government, and for privatization of costs, as Caplan does, there’s really nothing people can substitute for it. This is partially about signaling, sure, but it’s also because outside of schools and a few religious enclaves our society is but a darkling plain beset by winds.

This doesn’t mean that we should leave Caplan’s critique on the shelf. Much of education is focused on an insane, zero-sum race for finite rewards. Much of schooling does push kids, parents, schools, and school systems towards a solution ad absurdum, where anything less than 100 percent of kids headed to a doctorate and the big coding job in the sky is a sign of failure of everyone concerned.

But let’s approach this with an eye towards the limits of the possible and the reality of diminishing returns.

https://westhunt.wordpress.com/2018/01/27/poison-ivy-halls/

https://westhunt.wordpress.com/2018/01/27/poison-ivy-halls/#comment-101293

The real reason the left would support Moander: the usual reason. because he’s an enemy.

https://westhunt.wordpress.com/2018/02/01/bright-college-days-part-i/

I have a problem in thinking about education, since my preferences and personal educational experience are atypical, so I can’t just gut it out. On the other hand, knowing that puts me ahead of a lot of people that seem convinced that all real people, including all Arab cabdrivers, think and feel just as they do.

One important fact, relevant to this review. I don’t like Caplan. I think he doesn’t understand – can’t understand – human nature, and although that sometimes confers a different and interesting perspective, it’s not a royal road to truth. Nor would I want to share a foxhole with him: I don’t trust him. So if I say that I agree with some parts of this book, you should believe me.

...

Caplan doesn’t talk about possible ways of improving knowledge acquisition and retention. Maybe he thinks that’s impossible, and he may be right, at least within a conventional universe of possibilities. That’s a bit outside of his thesis, anyhow. Me it interests.

He dismisses objections from educational psychologists who claim that studying a subject improves you in subtle ways even after you forget all of it. I too find that hard to believe. On the other hand, it looks to me as if poorly-digested fragments of information picked up in college have some effect on public policy later in life: it is no coincidence that most prominent people in public life (at a given moment) share a lot of the same ideas. People are vaguely remembering the same crap from the same sources, or related sources. It’s correlated crap, which has a much stronger effect than random crap.

These widespread new ideas are usually wrong. They come from somewhere – in part, from higher education. Along this line, Caplan thinks that college has only a weak ideological effect on students. I don’t believe he is correct. In part, this is because most people use a shifting standard: what’s liberal or conservative gets redefined over time. At any given time a population is roughly half left and half right – but the content of those labels changes a lot. There’s a shift.

https://westhunt.wordpress.com/2018/02/01/bright-college-days-part-i/#comment-101492

I put it this way, a while ago: “When you think about it, falsehoods, stupid crap, make the best group identifiers, because anyone might agree with you when you’re obviously right. Signing up to clear nonsense is a better test of group loyalty. A true friend is with you when you’re wrong. Ideally, not just wrong, but barking mad, rolling around in your own vomit wrong.”

--

You just explained the Credo quia absurdum doctrine. I always wondered if it was nonsense. It is not.

--

Someone on twitter caught it first – got all the way to “sliding down the razor blade of life”. Which I explained is now called “transitioning”

What Catholics believe: https://theweek.com/articles/781925/what-catholics-believe

We believe all of these things, fantastical as they may sound, and we believe them for what we consider good reasons, well attested by history, consistent with the most exacting standards of logic. We will profess them in this place of wrath and tears until the extraordinary event referenced above, for which men and women have hoped and prayed for nearly 2,000 years, comes to pass.

https://westhunt.wordpress.com/2018/02/05/bright-college-days-part-ii/

According to Caplan, employers are looking for conformity, conscientiousness, and intelligence. They use completion of high school, or completion of college as a sign of conformity and conscientiousness. College certainly looks as if it’s mostly signaling, and it’s hugely expensive signaling, in terms of college costs and foregone earnings.

But inserting conformity into the merit function is tricky: things become important signals… because they’re important signals. Otherwise useful actions are contraindicated because they’re “not done”. For example, test scores convey useful information. They could help show that an applicant is smart even though he attended a mediocre school – the same role they play in college admissions. But employers seldom request test scores, and although applicants may provide them, few do. Caplan says ” The word on the street… [more]

econotariat
pseudoE
broad-econ
economics
econometrics
growth-econ
education
human-capital
labor
correlation
null-result
world
developing-world
commentary
spearhead
garett-jones
twitter
social
pic
discussion
econ-metrics
rindermann-thompson
causation
endo-exo
biodet
data
chart
knowledge
article
wealth-of-nations
latin-america
study
path-dependence
divergence
🎩
curvature
microfoundations
multi
convexity-curvature
nonlinearity
hanushek
volo-avolo
endogenous-exogenous
backup
pdf
people
policy
monetary-fiscal
wonkish
cracker-econ
news
org:mag
local-global
higher-ed
impetus
signaling
rhetoric
contrarianism
domestication
propaganda
ratty
hanson
books
review
recommendations
distribution
externalities
cost-benefit
summary
natural-experiment
critique
rent-seeking
mobility
supply-demand
intervention
shift
social-choice
government
incentives
interests
q-n-a
street-fighting
objektbuch
X-not-about-Y
marginal-rev
c:***
qra
info-econ
info-dynamics
org:econlib
yvain
ssc
politics
medicine
stories
deleeted

https://twitter.com/GarettJones/status/943238170312929280

https://archive.is/p5hRA

Did Nations that Boosted Education Grow Faster?: http://econlog.econlib.org/archives/2012/10/did_nations_tha.html

On average, no relationship. The trendline points down slightly, but for the time being let's just call it a draw. It's a well-known fact that countries that started the 1960's with high education levels grew faster (example), but this graph is about something different. This graph shows that countries that increased their education levels did not grow faster.

Where has all the education gone?: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1016.2704&rep=rep1&type=pdf

https://twitter.com/GarettJones/status/948052794681966593

https://archive.is/kjxqp

https://twitter.com/GarettJones/status/950952412503822337

https://archive.is/3YPic

https://twitter.com/pseudoerasmus/status/862961420065001472

http://hanushek.stanford.edu/publications/schooling-educational-achievement-and-latin-american-growth-puzzle

The Case Against Education: What's Taking So Long, Bryan Caplan: http://econlog.econlib.org/archives/2015/03/the_case_agains_9.html

The World Might Be Better Off Without College for Everyone: https://www.theatlantic.com/magazine/archive/2018/01/whats-college-good-for/546590/

Students don't seem to be getting much out of higher education.

- Bryan Caplan

College: Capital or Signal?: http://www.economicmanblog.com/2017/02/25/college-capital-or-signal/

After his review of the literature, Caplan concludes that roughly 80% of the earnings effect from college comes from signalling, with only 20% the result of skill building. Put this together with his earlier observations about the private returns to college education, along with its exploding cost, and Caplan thinks that the social returns are negative. The policy implications of this will come as very bitter medicine for friends of Bernie Sanders.

Doubting the Null Hypothesis: http://www.arnoldkling.com/blog/doubting-the-null-hypothesis/

Is higher education/college in the US more about skill-building or about signaling?: https://www.quora.com/Is-higher-education-college-in-the-US-more-about-skill-building-or-about-signaling

ballpark: 50% signaling, 30% selection, 20% addition to human capital

more signaling in art history, more human capital in engineering, more selection in philosophy

Econ Duel! Is Education Signaling or Skill Building?: http://marginalrevolution.com/marginalrevolution/2016/03/econ-duel-is-education-signaling-or-skill-building.html

Marginal Revolution University has a brand new feature, Econ Duel! Our first Econ Duel features Tyler and me debating the question, Is education more about signaling or skill building?

Against Tulip Subsidies: https://slatestarcodex.com/2015/06/06/against-tulip-subsidies/

https://www.overcomingbias.com/2018/01/read-the-case-against-education.html

https://nintil.com/2018/02/05/notes-on-the-case-against-education/

https://www.nationalreview.com/magazine/2018-02-19-0000/bryan-caplan-case-against-education-review

https://spottedtoad.wordpress.com/2018/02/12/the-case-against-education/

Most American public school kids are low-income; about half are non-white; most are fairly low skilled academically. For most American kids, the majority of the waking hours they spend not engaged with electronic media are at school; the majority of their in-person relationships are at school; the most important relationships they have with an adult who is not their parent is with their teacher. For their parents, the most important in-person source of community is also their kids’ school. Young people need adult mirrors, models, mentors, and in an earlier era these might have been provided by extended families, but in our own era this all falls upon schools.

Caplan gestures towards work and earlier labor force participation as alternatives to school for many if not all kids. And I empathize: the years that I would point to as making me who I am were ones where I was working, not studying. But they were years spent working in schools, as a teacher or assistant. If schools did not exist, is there an alternative that we genuinely believe would arise to draw young people into the life of their community?

...

It is not an accident that the state that spends the least on education is Utah, where the LDS church can take up some of the slack for schools, while next door Wyoming spends almost the most of any state at $16,000 per student. Education is now the one surviving binding principle of the society as a whole, the one black box everyone will agree to, and so while you can press for less subsidization of education by government, and for privatization of costs, as Caplan does, there’s really nothing people can substitute for it. This is partially about signaling, sure, but it’s also because outside of schools and a few religious enclaves our society is but a darkling plain beset by winds.

This doesn’t mean that we should leave Caplan’s critique on the shelf. Much of education is focused on an insane, zero-sum race for finite rewards. Much of schooling does push kids, parents, schools, and school systems towards a solution ad absurdum, where anything less than 100 percent of kids headed to a doctorate and the big coding job in the sky is a sign of failure of everyone concerned.

But let’s approach this with an eye towards the limits of the possible and the reality of diminishing returns.

https://westhunt.wordpress.com/2018/01/27/poison-ivy-halls/

https://westhunt.wordpress.com/2018/01/27/poison-ivy-halls/#comment-101293

The real reason the left would support Moander: the usual reason. because he’s an enemy.

https://westhunt.wordpress.com/2018/02/01/bright-college-days-part-i/

I have a problem in thinking about education, since my preferences and personal educational experience are atypical, so I can’t just gut it out. On the other hand, knowing that puts me ahead of a lot of people that seem convinced that all real people, including all Arab cabdrivers, think and feel just as they do.

One important fact, relevant to this review. I don’t like Caplan. I think he doesn’t understand – can’t understand – human nature, and although that sometimes confers a different and interesting perspective, it’s not a royal road to truth. Nor would I want to share a foxhole with him: I don’t trust him. So if I say that I agree with some parts of this book, you should believe me.

...

Caplan doesn’t talk about possible ways of improving knowledge acquisition and retention. Maybe he thinks that’s impossible, and he may be right, at least within a conventional universe of possibilities. That’s a bit outside of his thesis, anyhow. Me it interests.

He dismisses objections from educational psychologists who claim that studying a subject improves you in subtle ways even after you forget all of it. I too find that hard to believe. On the other hand, it looks to me as if poorly-digested fragments of information picked up in college have some effect on public policy later in life: it is no coincidence that most prominent people in public life (at a given moment) share a lot of the same ideas. People are vaguely remembering the same crap from the same sources, or related sources. It’s correlated crap, which has a much stronger effect than random crap.

These widespread new ideas are usually wrong. They come from somewhere – in part, from higher education. Along this line, Caplan thinks that college has only a weak ideological effect on students. I don’t believe he is correct. In part, this is because most people use a shifting standard: what’s liberal or conservative gets redefined over time. At any given time a population is roughly half left and half right – but the content of those labels changes a lot. There’s a shift.

https://westhunt.wordpress.com/2018/02/01/bright-college-days-part-i/#comment-101492

I put it this way, a while ago: “When you think about it, falsehoods, stupid crap, make the best group identifiers, because anyone might agree with you when you’re obviously right. Signing up to clear nonsense is a better test of group loyalty. A true friend is with you when you’re wrong. Ideally, not just wrong, but barking mad, rolling around in your own vomit wrong.”

--

You just explained the Credo quia absurdum doctrine. I always wondered if it was nonsense. It is not.

--

Someone on twitter caught it first – got all the way to “sliding down the razor blade of life”. Which I explained is now called “transitioning”

What Catholics believe: https://theweek.com/articles/781925/what-catholics-believe

We believe all of these things, fantastical as they may sound, and we believe them for what we consider good reasons, well attested by history, consistent with the most exacting standards of logic. We will profess them in this place of wrath and tears until the extraordinary event referenced above, for which men and women have hoped and prayed for nearly 2,000 years, comes to pass.

https://westhunt.wordpress.com/2018/02/05/bright-college-days-part-ii/

According to Caplan, employers are looking for conformity, conscientiousness, and intelligence. They use completion of high school, or completion of college as a sign of conformity and conscientiousness. College certainly looks as if it’s mostly signaling, and it’s hugely expensive signaling, in terms of college costs and foregone earnings.

But inserting conformity into the merit function is tricky: things become important signals… because they’re important signals. Otherwise useful actions are contraindicated because they’re “not done”. For example, test scores convey useful information. They could help show that an applicant is smart even though he attended a mediocre school – the same role they play in college admissions. But employers seldom request test scores, and although applicants may provide them, few do. Caplan says ” The word on the street… [more]

april 2017 by nhaliday

distributions - In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values? - Cross Validated

q-n-a overflow nibble data-science stats methodology multiplicative regression linear-models best-practices checklists econometrics positivity outliers street-fighting nitty-gritty reference signum

april 2017 by nhaliday

q-n-a overflow nibble data-science stats methodology multiplicative regression linear-models best-practices checklists econometrics positivity outliers street-fighting nitty-gritty reference signum

april 2017 by nhaliday

Malthus in the Bedroom: Birth Spacing as Birth Control in Pre-Transition England | SpringerLink

april 2017 by nhaliday

Randomness in the Bedroom: There Is No Evidence for Fertility Control in Pre-Industrial England: https://link.springer.com/article/10.1007/s13524-019-00786-2

- Gregory Clark et al.

https://twitter.com/Schmidt_Erwin/status/1142740263569448961

https://archive.is/HUYPf

both cause and effect of England not being France , which lowered fertility significantly already in the 18th century, I believe largely through anal sex and coitus interruptus

- Spotted Toad

--

Is there a source I can check on that? That's almost too French to be true. Lol.

study
anthropology
sociology
britain
history
early-modern
demographics
fertility
demographic-transition
sex
class
s-factor
spearhead
gregory-clark
malthus
broad-econ
multi
critique
methodology
gotchas
intricacy
stats
estimate
ratty
unaffiliated
twitter
social
commentary
backup
europe
gallic
idk
sexuality
- Gregory Clark et al.

https://twitter.com/Schmidt_Erwin/status/1142740263569448961

https://archive.is/HUYPf

both cause and effect of England not being France , which lowered fertility significantly already in the 18th century, I believe largely through anal sex and coitus interruptus

- Spotted Toad

--

Is there a source I can check on that? That's almost too French to be true. Lol.

april 2017 by nhaliday

Statistician Proves Gaussian Correlation Inequality | Quanta Magazine

news org:mag org:sci popsci math probability stats geometry math.MG research multi pdf papers preprint nibble profile stories AMT intersection estimate measure curvature convexity-curvature org:inst org:mat intersection-connectedness

march 2017 by nhaliday

news org:mag org:sci popsci math probability stats geometry math.MG research multi pdf papers preprint nibble profile stories AMT intersection estimate measure curvature convexity-curvature org:inst org:mat intersection-connectedness

march 2017 by nhaliday

The Myth of "Mind-Altering Parasite" Toxoplasma Gondii? - Neuroskeptic

march 2017 by nhaliday

Gwern explains why underpowered (so their study is wrong) and also gives Bayesian analysis suggesting real effect

more: https://www.theatlantic.com/magazine/archive/2012/03/how-your-cat-is-making-you-crazy/308873/

some other effects listed like Influenza, AIDS, syphilis, herpes

Rage Disorder Linked with Parasite Found in Cat Feces: https://www.scientificamerican.com/article/rage-disorder-linked-with-parasite-found-in-cat-feces/

scitariat
neuro
parasites-microbiome
disease
psychiatry
hmm
nature
toxo-gondii
regularizer
gwern
stat-power
methodology
stats
error
replication
study
summary
commentary
public-health
bayesian
analysis
critique
idk
news
org:mag
being-right
the-trenches
eastern-europe
communism
emotion
extra-introversion
gender
gender-diff
trust
endocrine
multi
brain-scan
org:sci
popsci
immune
peace-violence
more: https://www.theatlantic.com/magazine/archive/2012/03/how-your-cat-is-making-you-crazy/308873/

some other effects listed like Influenza, AIDS, syphilis, herpes

Rage Disorder Linked with Parasite Found in Cat Feces: https://www.scientificamerican.com/article/rage-disorder-linked-with-parasite-found-in-cat-feces/

march 2017 by nhaliday

Econometrics and the Log-Linear Model

march 2017 by nhaliday

THE LINEAR-LOG MODEL IN ECONOMETRICS: http://www.dummies.com/education/economics/econometrics/the-linear-log-model-in-econometrics/

better-explained
economics
econometrics
methodology
stats
model-class
models
linear-models
explanation
labor
human-capital
org:lite
multiplicative
multi
atoms
march 2017 by nhaliday

Understanding statistics through interactive visualizations

explanation list visualization gotchas paradox stats methodology hypothesis-testing visual-understanding better-explained links regression-to-mean metabuch examples data-science street-fighting intuition ground-up nitty-gritty

march 2017 by nhaliday

explanation list visualization gotchas paradox stats methodology hypothesis-testing visual-understanding better-explained links regression-to-mean metabuch examples data-science street-fighting intuition ground-up nitty-gritty

march 2017 by nhaliday

Law of total variance - Wikipedia

march 2017 by nhaliday

Var Y = E[Var(Y|X)] + Var E[Y|X]

math
acm
stats
probability
identity
levers
wiki
reference
marginal
moments
bias-variance
nibble
march 2017 by nhaliday

Copy this bookmark: