nhaliday : monte-carlo   33

Is there a common method for detecting the convergence of the Gibbs sampler and the expectation-maximization algorithm? - Quora
In practice and theory it is much easier to diagnose convergence in EM (vanilla or variational) than in any MCMC algorithm (including Gibbs sampling).

https://www.quora.com/How-can-you-determine-if-your-Gibbs-sampler-has-converged
There is a special case when you can actually obtain the stationary distribution, and be sure that you did! If your markov chain consists of a discrete state space, then take the first time that a state repeats in your chain: if you randomly sample an element between the repeating states (but only including one of the endpoints) you will have a sample from your true distribution.

One can achieve this 'exact MCMC sampling' more generally by using the coupling from the past algorithm (Coupling from the past).

Otherwise, there is no rigorous statistical test for convergence. It may be possible to obtain a theoretical bound for the convergence rates: but these are quite difficult to obtain, and quite often too large to be of practical use. For example, even for the simple case of using the Metropolis algorithm for sampling from a two-dimensional uniform distribution, the best convergence rate upper bound achieved, by Persi Diaconis, was something with an astronomical constant factor like 10^300.

In fact, it is fair to say that for most high dimensional problems, we have really no idea whether Gibbs sampling ever comes close to converging, but the best we can do is use some simple diagnostics to detect the most obvious failures.
nibble  q-n-a  qra  acm  stats  probability  limits  convergence  distribution  sampling  markov  monte-carlo  ML-MAP-E  checking  equilibrium  stylized-facts  gelman  levers  mixing  empirical  plots  manifolds  multi  fixed-point  iteration-recursion  heuristic  expert-experience  theory-practice  project
october 2019 by nhaliday
Stat 260/CS 294: Bayesian Modeling and Inference
Topics
- Priors (conjugate, noninformative, reference)
- Hierarchical models, spatial models, longitudinal models, dynamic models, survival models
- Testing
- Model choice
- Inference (importance sampling, MCMC, sequential Monte Carlo)
- Nonparametric models (Dirichlet processes, Gaussian processes, neutral-to-the-right processes, completely random measures)
- Decision theory and frequentist perspectives (complete class theorems, consistency, empirical Bayes)
- Experimental design
unit  course  berkeley  expert  michael-jordan  machine-learning  acm  bayesian  probability  stats  lecture-notes  priors-posteriors  markov  monte-carlo  frequentist  latent-variables  decision-theory  expert-experience  confidence  sampling
july 2017 by nhaliday
Econometric Modeling as Junk Science
The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics: https://www.aeaweb.org/articles?id=10.1257/jep.24.2.3

On data, experiments, incentives and highly unconvincing research – papers and hot beverages: https://papersandhotbeverages.wordpress.com/2015/10/31/on-data-experiments-incentives-and-highly-unconvincing-research/
In my view, it has just to do with the fact that academia is a peer monitored organization. In the case of (bad) data collection papers, issues related to measurement are typically boring. They are relegated to appendices, no one really has an incentive to monitor it seriously. The problem is similar in formal theory: no one really goes through the algebra in detail, but it is in principle feasible to do it, and, actually, sometimes these errors are detected. If discussing the algebra of a proof is almost unthinkable in a seminar, going into the details of data collection, measurement and aggregation is not only hard to imagine, but probably intrinsically infeasible.

Something different happens for the experimentalist people. As I was saying, I feel we have come to a point in which many papers are evaluated based on the cleverness and originality of the research design (“Using the World Cup qualifiers as an instrument for patriotism!? Woaw! how cool/crazy is that! I wish I had had that idea”). The sexiness of the identification strategy has too often become a goal in itself. When your peers monitor you paying more attention to the originality of the identification strategy than to the research question, you probably have an incentive to mine reality for ever crazier discontinuities. It is true methodologists have been criticized in the past for analogous reasons, such as being guided by the desire to increase mathematical complexity without a clear benefit. But, if you work with pure formal theory or statistical theory, your work is not meant to immediately answer question about the real world, but instead to serve other researchers in their quest. This is something that can, in general, not be said of applied CI work.

This post should have been entitled “Zombies who only think of their next cool IV fix”
massive lust for quasi-natural experiments, regression discontinuities
barely matters if the effects are not all that big
I suppose even the best of things must reach their decadent phase; methodological innov. to manias……

Following this "collapse of small-N social psych results" business, where do I predict econ will collapse? I see two main contenders.
One is lab studies. I dallied with these a few years ago in a Kenya lab. We ran several pilots of N=200 to figure out the best way to treat
and to measure the outcome. Every pilot gave us a different stat sig result. I could have written six papers concluding different things.
I gave up more skeptical of these lab studies than ever before. The second contender is the long run impacts literature in economic history
We should be very suspicious since we never see a paper showing that a historical event had no effect on modern day institutions or dvpt.
On the one hand I find these studies fun, fascinating, and probably true in a broad sense. They usually reinforce a widely believed history
argument with interesting data and a cute empirical strategy. But I don't think anyone believes the standard errors. There's probably a HUGE
problem of nonsignificant results staying in the file drawer. Also, there are probably data problems that don't get revealed, as we see with
the recent Piketty paper (http://marginalrevolution.com/marginalrevolution/2017/10/pikettys-data-reliable.html). So I take that literature with a vat of salt, even if I enjoy and admire the works
I used to think field experiments would show little consistency in results across place. That external validity concerns would be fatal.
In fact the results across different samples and places have proven surprisingly similar across places, and added a lot to general theory
Last, I've come to believe there is no such thing as a useful instrumental variable. The ones that actually meet the exclusion restriction
are so weird & particular that the local treatment effect is likely far different from the average treatment effect in non-transparent ways.
Most of the other IVs don't plausibly meet the e clue ion restriction. I mean, we should be concerned when the IV estimate is always 10x
larger than the OLS coefficient. This I find myself much more persuaded by simple natural experiments that use OLS, diff in diff, or
discontinuities, alongside randomized trials.

What do others think are the cliffs in economics?
PS All of these apply to political science too. Though I have a special extra target in poli sci: survey experiments! A few are good. I like
Dan Corstange's work. But it feels like 60% of dissertations these days are experiments buried in a survey instrument that measure small
changes in response. These at least have large N. But these are just uncontrolled labs, with negligible external validity in my mind.
The good ones are good. This method has its uses. But it's being way over-applied. More people have to make big and risky investments in big
natural and field experiments. Time to raise expectations and ambitions. This expectation bar, not technical ability, is the big advantage
economists have over political scientists when they compete in the same space.
(Ok. So are there any friends and colleagues I haven't insulted this morning? Let me know and I'll try my best to fix it with a screed)

HOW MUCH SHOULD WE TRUST DIFFERENCES-IN-DIFFERENCES ESTIMATES?∗: https://economics.mit.edu/files/750
Most papers that employ Differences-in-Differences estimation (DD) use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent. To illustrate the severity of this issue, we randomly generate placebo laws in state-level data on female wages from the Current Population Survey. For each law, we use OLS to compute the DD estimate of its “effect” as well as the standard error of this estimate. These conventional DD standard errors severely understate the standard deviation of the estimators: we find an “effect” significant at the 5 percent level for up to 45 percent of the placebo interventions. We use Monte Carlo simulations to investigate how well existing methods help solve this problem. Econometric corrections that place a specific parametric form on the time-series process do not perform well. Bootstrap (taking into account the auto-correlation of the data) works well when the number of states is large enough. Two corrections based on asymptotic approximation of the variance-covariance matrix work well for moderate numbers of states and one correction that collapses the time series information into a “pre” and “post” period and explicitly takes into account the effective sample size works well even for small numbers of states.

‘METRICS MONDAY: 2SLS–CHRONICLE OF A DEATH FORETOLD: http://marcfbellemare.com/wordpress/12733
As it turns out, Young finds that
1. Conventional tests tend to overreject the null hypothesis that the 2SLS coefficient is equal to zero.
2. 2SLS estimates are falsely declared significant one third to one half of the time, depending on the method used for bootstrapping.
3. The 99-percent confidence intervals (CIs) of those 2SLS estimates include the OLS point estimate over 90 of the time. They include the full OLS 99-percent CI over 75 percent of the time.
4. 2SLS estimates are extremely sensitive to outliers. Removing simply one outlying cluster or observation, almost half of 2SLS results become insignificant. Things get worse when removing two outlying clusters or observations, as over 60 percent of 2SLS results then become insignificant.
5. Using a Durbin-Wu-Hausman test, less than 15 percent of regressions can reject the null that OLS estimates are unbiased at the 1-percent level.
6. 2SLS has considerably higher mean squared error than OLS.
7. In one third to one half of published results, the null that the IVs are totally irrelevant cannot be rejected, and so the correlation between the endogenous variable(s) and the IVs is due to finite sample correlation between them.
8. Finally, fewer than 10 percent of 2SLS estimates reject instrument irrelevance and the absence of OLS bias at the 1-percent level using a Durbin-Wu-Hausman test. It gets much worse–fewer than 5 percent–if you add in the requirement that the 2SLS CI that excludes the OLS estimate.

Methods Matter: P-Hacking and Causal Inference in Economics*: http://ftp.iza.org/dp11796.pdf
Applying multiple methods to 13,440 hypothesis tests reported in 25 top economics journals in 2015, we show that selective publication and p-hacking is a substantial problem in research employing DID and (in particular) IV. RCT and RDD are much less problematic. Almost 25% of claims of marginally significant results in IV papers are misleading.

Ever since I learned social science is completely fake, I've had a lot more time to do stuff that matters, like deadlifting and reading about Mediterranean haplogroups
--
Wait, so, from fakest to realest IV>DD>RCT>RDD? That totally matches my impression.

https://archive.is/EZu0h
Great (not completely new but still good to have it in one place) discussion of RCTs and inference in economics by Deaton, my favorite sentences (more general than just about RCT) below
Randomization in the tropics revisited: a theme and eleven variations: https://scholar.princeton.edu/sites/default/files/deaton/files/deaton_randomization_revisited_v3_2019.pdf
org:junk  org:edu  economics  econometrics  methodology  realness  truth  science  social-science  accuracy  generalization  essay  article  hmm  multi  study  🎩  empirical  causation  error  critique  sociology  criminology  hypothesis-testing  econotariat  broad-econ  cliometrics  endo-exo  replication  incentives  academia  measurement  wire-guided  intricacy  twitter  social  discussion  pseudoE  effect-size  reflection  field-study  stat-power  piketty  marginal-rev  commentary  data-science  expert-experience  regression  gotchas  rant  map-territory  pdf  simulation  moments  confidence  bias-variance  stats  endogenous-exogenous  control  meta:science  meta-analysis  outliers  summary  sampling  ensembles  monte-carlo  theory-practice  applicability-prereqs  chart  comparison  shift  ratty  unaffiliated  garett-jones
june 2017 by nhaliday
CS 731 Advanced Artificial Intelligence - Spring 2011
- statistical machine learning
- sparsity in regression
- graphical models
- exponential families
- variational methods
- MCMC
- dimensionality reduction, eg, PCA
- Bayesian nonparametrics
- compressive sensing, matrix completion, and Johnson-Lindenstrauss
course  lecture-notes  yoga  acm  stats  machine-learning  graphical-models  graphs  model-class  bayesian  learning-theory  sparsity  embeddings  markov  monte-carlo  norms  unit  nonparametric  compressed-sensing  matrix-factorization  features
january 2017 by nhaliday
Stan
andrew gelman's language
can this do graphical models (I remember some issues w/ that)?
programming  bayesian  stats  python  ppl  libraries  oss  pls  monte-carlo  r-lang  DSL
april 2016 by nhaliday

Copy this bookmark: