nhaliday : levers   104

 « earlier
Is there a common method for detecting the convergence of the Gibbs sampler and the expectation-maximization algorithm? - Quora
In practice and theory it is much easier to diagnose convergence in EM (vanilla or variational) than in any MCMC algorithm (including Gibbs sampling).

https://www.quora.com/How-can-you-determine-if-your-Gibbs-sampler-has-converged
There is a special case when you can actually obtain the stationary distribution, and be sure that you did! If your markov chain consists of a discrete state space, then take the first time that a state repeats in your chain: if you randomly sample an element between the repeating states (but only including one of the endpoints) you will have a sample from your true distribution.

One can achieve this 'exact MCMC sampling' more generally by using the coupling from the past algorithm (Coupling from the past).

Otherwise, there is no rigorous statistical test for convergence. It may be possible to obtain a theoretical bound for the convergence rates: but these are quite difficult to obtain, and quite often too large to be of practical use. For example, even for the simple case of using the Metropolis algorithm for sampling from a two-dimensional uniform distribution, the best convergence rate upper bound achieved, by Persi Diaconis, was something with an astronomical constant factor like 10^300.

In fact, it is fair to say that for most high dimensional problems, we have really no idea whether Gibbs sampling ever comes close to converging, but the best we can do is use some simple diagnostics to detect the most obvious failures.
nibble  q-n-a  qra  acm  stats  probability  limits  convergence  distribution  sampling  markov  monte-carlo  ML-MAP-E  checking  equilibrium  stylized-facts  gelman  levers  mixing  empirical  plots  manifolds  multi  fixed-point  iteration-recursion  heuristic  expert-experience  theory-practice  project
october 2019 by nhaliday
Factorization of polynomials over finite fields - Wikipedia
In mathematics and computer algebra the factorization of a polynomial consists of decomposing it into a product of irreducible factors. This decomposition is theoretically possible and is unique for polynomials with coefficients in any field, but rather strong restrictions on the field of the coefficients are needed to allow the computation of the factorization by means of an algorithm. In practice, algorithms have been designed only for polynomials with coefficients in a finite field, in the field of rationals or in a finitely generated field extension of one of them.

All factorization algorithms, including the case of multivariate polynomials over the rational numbers, reduce the problem to this case; see polynomial factorization. It is also used for various applications of finite fields, such as coding theory (cyclic redundancy codes and BCH codes), cryptography (public key cryptography by the means of elliptic curves), and computational number theory.

As the reduction of the factorization of multivariate polynomials to that of univariate polynomials does not have any specificity in the case of coefficients in a finite field, only polynomials with one variable are considered in this article.

...

In the algorithms that follow, the complexities are expressed in terms of number of arithmetic operations in Fq, using classical algorithms for the arithmetic of polynomials.

[ed.: Interesting choice...]

...

Factoring algorithms
Many algorithms for factoring polynomials over finite fields include the following three stages:

Square-free factorization
Distinct-degree factorization
Equal-degree factorization
An important exception is Berlekamp's algorithm, which combines stages 2 and 3.

Berlekamp's algorithm
Main article: Berlekamp's algorithm
The Berlekamp's algorithm is historically important as being the first factorization algorithm, which works well in practice. However, it contains a loop on the elements of the ground field, which implies that it is practicable only over small finite fields. For a fixed ground field, its time complexity is polynomial, but, for general ground fields, the complexity is exponential in the size of the ground field.

[ed.: This actually looks fairly implementable.]
wiki  reference  concept  algorithms  calculation  nibble  numerics  math  algebra  math.CA  fields  polynomials  levers  multiplicative  math.NT
july 2019 by nhaliday
Measuring fitness heritability: Life history traits versus morphological traits in humans - Gavrus‐Ion - 2017 - American Journal of Physical Anthropology - Wiley Online Library
Traditional interpretation of Fisher's Fundamental Theorem of Natural Selection is that life history traits (LHT), which are closely related with fitness, show lower heritabilities, whereas morphological traits (MT) are less related with fitness and they are expected to show higher heritabilities.

...

LHT heritabilities ranged from 2.3 to 34% for the whole sample, with men showing higher heritabilities (4–45%) than women (0‐23.7%). Overall, MT presented higher heritability values than most of LHT, ranging from 0 to 40.5% in craniofacial indices, and from 13.8 to 32.4% in craniofacial angles. LHT showed considerable additive genetic variance values, similar to MT, but also high environmental variance values, and most of them presenting a higher evolutionary potential than MT.
study  biodet  behavioral-gen  population-genetics  hmm  contrarianism  levers  inference  variance-components  fertility  life-history  demographics  embodied  prediction  contradiction  empirical  sib-study
may 2019 by nhaliday
ON THE GEOMETRY OF NASH EQUILIBRIA AND CORRELATED EQUILIBRIA
Abstract: It is well known that the set of correlated equilibrium distributions of an n-player noncooperative game is a convex polytope that includes all the Nash equilibrium distributions. We demonstrate an elementary yet surprising result: the Nash equilibria all lie on the boundary of the polytope.
pdf  nibble  papers  ORFE  game-theory  optimization  geometry  dimensionality  linear-algebra  equilibrium  structure  differential  correlation  iidness  acm  linear-programming  spatial  characterization  levers
may 2019 by nhaliday
Stein's example - Wikipedia
Stein's example (or phenomenon or paradox), in decision theory and estimation theory, is the phenomenon that when three or more parameters are estimated simultaneously, there exist combined estimators more accurate on average (that is, having lower expected mean squared error) than any method that handles the parameters separately. It is named after Charles Stein of Stanford University, who discovered the phenomenon in 1955.

An intuitive explanation is that optimizing for the mean-squared error of a combined estimator is not the same as optimizing for the errors of separate estimators of the individual parameters. In practical terms, if the combined error is in fact of interest, then a combined estimator should be used, even if the underlying parameters are independent; this occurs in channel estimation in telecommunications, for instance (different factors affect overall channel performance). On the other hand, if one is instead interested in estimating an individual parameter, then using a combined estimator does not help and is in fact worse.

...

Many simple, practical estimators achieve better performance than the ordinary estimator. The best-known example is the James–Stein estimator, which works by starting at X and moving towards a particular point (such as the origin) by an amount inversely proportional to the distance of X from that point.
nibble  concept  levers  wiki  reference  acm  stats  probability  decision-theory  estimate  distribution  atoms
february 2018 by nhaliday
Use and Interpretation of LD Score Regression
LD Score regression distinguishes confounding from polygenicity in genome-wide association studies: https://sci-hub.bz/10.1038/ng.3211
- Po-Ru Loh, Nick Patterson, et al.

https://www.biorxiv.org/content/biorxiv/early/2014/02/21/002931.full.pdf

Both polygenicity (i.e. many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield inflated distributions of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from bias and true signal from polygenicity. We have developed an approach that quantifies the contributions of each by examining the relationship between test statistics and linkage disequilibrium (LD). We term this approach LD Score regression. LD Score regression provides an upper bound on the contribution of confounding bias to the observed inflation in test statistics and can be used to estimate a more powerful correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of test statistic inflation in many GWAS of large sample size.

Supplementary Note: https://images.nature.com/original/nature-assets/ng/journal/v47/n3/extref/ng.3211-S1.pdf

An atlas of genetic correlations across human diseases
and traits: https://sci-hub.bz/10.1038/ng.3406

https://www.biorxiv.org/content/early/2015/01/27/014498.full.pdf

Supplementary Note: https://images.nature.com/original/nature-assets/ng/journal/v47/n11/extref/ng.3406-S1.pdf

https://github.com/bulik/ldsc
ldsc is a command line tool for estimating heritability and genetic correlation from GWAS summary statistics. ldsc also computes LD Scores.
nibble  pdf  slides  talks  bio  biodet  genetics  genomics  GWAS  genetic-correlation  correlation  methodology  bioinformatics  concept  levers  🌞  tutorial  explanation  pop-structure  gene-drift  ideas  multi  study  org:nat  article  repo  software  tools  libraries  stats  hypothesis-testing  biases  confounding  gotchas  QTL  simulation  survey  preprint  population-genetics
november 2017 by nhaliday
- Patterson, Reich et al., 2012
Population mixture is an important process in biology. We present a suite of methods for learning about population mixtures, implemented in a software package called ADMIXTOOLS, that support formal tests for whether mixture occurred and make it possible to infer proportions and dates of mixture. We also describe the development of a new single nucleotide polymorphism (SNP) array consisting of 629,433 sites with clearly documented ascertainment that was specifically designed for population genetic analyses and that we genotyped in 934 individuals from 53 diverse populations. To illustrate the methods, we give a number of examples that provide new insights about the history of human admixture. The most striking finding is a clear signal of admixture into northern Europe, with one ancestral population related to present-day Basques and Sardinians and the other related to present-day populations of northeast Asia and the Americas. This likely reflects a history of admixture between Neolithic migrants and the indigenous Mesolithic population of Europe, consistent with recent analyses of ancient bones from Sweden and the sequencing of the genome of the Tyrolean “Iceman.”
nibble  pdf  study  article  methodology  bio  sapiens  genetics  genomics  population-genetics  migration  gene-flow  software  trees  concept  history  antiquity  europe  roots  gavisti  🌞  bioinformatics  metrics  hypothesis-testing  levers  ideas  libraries  tools  pop-structure
november 2017 by nhaliday
Introduction to Scaling Laws
http://galileo.phys.virginia.edu/classes/304/scaling.pdf

Galileo’s Discovery of Scaling Laws: https://www.mtholyoke.edu/~mpeterso/classes/galileo/scaling8.pdf
Days 1 and 2 of Two New Sciences

An example of such an insight is “the surface of a small solid is comparatively greater than that of a large one” because the surface goes like the square of a linear dimension, but the volume goes like the cube.5 Thus as one scales down macroscopic objects, forces on their surfaces like viscous drag become relatively more important, and bulk forces like weight become relatively less important. Galileo uses this idea on the First Day in the context of resistance in free fall, as an explanation for why similar objects of different size do not fall exactly together, but the smaller one lags behind.
nibble  org:junk  exposition  lecture-notes  physics  mechanics  street-fighting  problem-solving  scale  magnitude  estimate  fermi  mental-math  calculation  nitty-gritty  multi  scitariat  org:bleg  lens  tutorial  guide  ground-up  tricki  skeleton  list  cheatsheet  identity  levers  hi-order-bits  yoga  metabuch  pdf  article  essay  history  early-modern  europe  the-great-west-whale  science  the-trenches  discovery  fluid  architecture  oceans  giants  tidbits  elegance
august 2017 by nhaliday
Inscribed angle - Wikipedia
pf:
- for triangle w/ one side = a diameter, draw isosceles triangle and use supplementary angle identities
- otherwise draw second triangle w/ side = a diameter, and use above result twice
nibble  math  geometry  spatial  ground-up  wiki  reference  proofs  identity  levers  yoga
august 2017 by nhaliday
Diophantine approximation - Wikipedia
- rationals perfectly approximated by themselves, badly approximated (eps>1/bq) by other rationals
- irrationals well-approximated (eps~1/q^2) by rationals:
https://en.wikipedia.org/wiki/Dirichlet%27s_approximation_theorem
nibble  wiki  reference  math  math.NT  approximation  accuracy  levers  pigeonhole-markov  multi  tidbits  discrete  rounding  estimate  tightness  algebra
august 2017 by nhaliday
Kelly criterion - Wikipedia
In probability theory and intertemporal portfolio choice, the Kelly criterion, Kelly strategy, Kelly formula, or Kelly bet, is a formula used to determine the optimal size of a series of bets. In most gambling scenarios, and some investing scenarios under some simplifying assumptions, the Kelly strategy will do better than any essentially different strategy in the long run (that is, over a span of time in which the observed fraction of bets that are successful equals the probability that any given bet will be successful). It was described by J. L. Kelly, Jr, a researcher at Bell Labs, in 1956. The practical use of the formula has been demonstrated.

The Kelly Criterion is to bet a predetermined fraction of assets and can be counterintuitive. In one study, each participant was given \$25 and asked to bet on a coin that would land heads 60% of the time. Participants had 30 minutes to play, so could place about 300 bets, and the prizes were capped at \$250. Behavior was far from optimal. "Remarkably, 28% of the participants went bust, and the average payout was just \$91. Only 21% of the participants reached the maximum. 18 of the 61 participants bet everything on one toss, while two-thirds gambled on tails at some stage in the experiment." Using the Kelly criterion and based on the odds in the experiment, the right approach would be to bet 20% of the pot on each throw (see first example in Statement below). If losing, the size of the bet gets cut; if winning, the stake increases.
nibble  betting  investing  ORFE  acm  checklists  levers  probability  algorithms  wiki  reference  atoms  extrema  parsimony  tidbits  decision-theory  decision-making  street-fighting  mental-math  calculation
august 2017 by nhaliday
Pearson correlation coefficient - Wikipedia
https://en.wikipedia.org/wiki/Coefficient_of_determination
deleted but it was about the Pearson correlation distance: 1-r
I guess it's a metric

https://en.wikipedia.org/wiki/Explained_variation

http://infoproc.blogspot.com/2014/02/correlation-and-variance.html
A less misleading way to think about the correlation R is as follows: given X,Y from a standardized bivariate distribution with correlation R, an increase in X leads to an expected increase in Y: dY = R dX. In other words, students with +1 SD SAT score have, on average, roughly +0.4 SD college GPAs. Similarly, students with +1 SD college GPAs have on average +0.4 SAT.

this reminds me of the breeder's equation (but it uses r instead of h^2, so it can't actually be the same)

stats  science  hypothesis-testing  correlation  metrics  plots  regression  wiki  reference  nibble  methodology  multi  twitter  social  discussion  best-practices  econotariat  garett-jones  concept  conceptual-vocab  accuracy  causation  acm  matrix-factorization  todo  explanation  yoga  hsu  street-fighting  levers  🌞  2014  scitariat  variance-components  meta:prediction  biodet  s:**  mental-math  reddit  commentary  ssc  poast  gwern  data-science  metric-space  similarity  measure  dependence-independence
may 2017 by nhaliday
Strings, periods, and borders
A border of x is any proper prefix of x that equals a suffix of x.

...overlapping borders of a string imply that the string is periodic...

In the border array ß[1..n] of x, entry ß[i] is the length
of the longest border of x[1..i].
pdf  nibble  slides  lectures  algorithms  strings  exposition  yoga  atoms  levers  tidbits  sequential  backup
may 2017 by nhaliday
Kin selection - Wikipedia
Formally, genes should increase in frequency when

{\displaystyle rB>C}
where

r=the genetic relatedness of the recipient to the actor, often defined as the probability that a gene picked randomly from each at the same locus is identical by descent.
B=the additional reproductive benefit gained by the recipient of the altruistic act,
C=the reproductive cost to the individual performing the act.
This inequality is known as Hamilton's rule after W. D. Hamilton who in 1964 published the first formal quantitative treatment of kin selection.

The relatedness parameter (r) in Hamilton's rule was introduced in 1922 by Sewall Wright as a coefficient of relationship that gives the probability that at a random locus, the alleles there will be identical by descent. Subsequent authors, including Hamilton, sometimes reformulate this with a regression, which, unlike probabilities, can be negative. A regression analysis producing statistically significant negative relationships indicates that two individuals are less genetically alike than two random ones (Hamilton 1970, Nature & Grafen 1985 Oxford Surveys in Evolutionary Biology). This has been invoked to explain the evolution of spiteful behaviour consisting of acts that result in harm, or loss of fitness, to both the actor and the recipient.

Several scientific studies have found that the kin selection model can be applied to nature. For example, in 2010 researchers used a wild population of red squirrels in Yukon, Canada to study kin selection in nature. The researchers found that surrogate mothers would adopt related orphaned squirrel pups but not unrelated orphans. The researchers calculated the cost of adoption by measuring a decrease in the survival probability of the entire litter after increasing the litter by one pup, while benefit was measured as the increased chance of survival of the orphan. The degree of relatedness of the orphan and surrogate mother for adoption to occur depended on the number of pups the surrogate mother already had in her nest, as this affected the cost of adoption. The study showed that females always adopted orphans when rB > C, but never adopted when rB < C, providing strong support for Hamilton's rule.
bio  nature  evolution  selection  group-selection  kinship  altruism  levers  methodology  population-genetics  genetics  wiki  reference  nibble  stylized-facts  biodet  🌞  concept  metrics  EGT  selfish-gene  cooperate-defect  similarity  interests  ecology
march 2017 by nhaliday
Fundamental Theorems of Evolution: The American Naturalist: Vol 0, No 0
I suggest that the most fundamental theorem of evolution is the Price equation, both because of its simplicity and broad scope and because it can be used to derive four other familiar results that are similarly fundamental: Fisher’s average-excess equation, Robertson’s secondary theorem of natural selection, the breeder’s equation, and Fisher’s fundamental theorem. These derivations clarify both the relationships behind these results and their assumptions. Slightly less fundamental results include those for multivariate evolution and social selection. A key feature of fundamental theorems is that they have great simplicity and scope, which are often achieved by sacrificing perfect accuracy. Quantitative genetics has been more productive of fundamental theorems than population genetics, probably because its empirical focus on unknown genotypes freed it from the tyranny of detail and allowed it to focus on general issues.
study  essay  bio  evolution  population-genetics  fisher  selection  EGT  dynamical  exposition  methodology  🌞  big-picture  levers  list  nibble  article  chart  explanation  clarity  trees  ground-up  ideas  grokkability-clarity
march 2017 by nhaliday
More on Multivariate Gaussians
Fact #1: mean and covariance uniquely determine distribution
Fact #3: closure under sum, marginalizing, and conditioning
covariance of conditional distribution is given by a Schur complement (independent of x_B. is that obvious?)
pdf  exposition  lecture-notes  stanford  nibble  distribution  acm  machine-learning  probability  levers  calculation  ground-up  characterization  rigidity  closure  nitty-gritty  linear-algebra  properties
february 2017 by nhaliday
Structure theorem for finitely generated modules over a principal ideal domain - Wikipedia
- finitely generative modules over PID isomorphic to sum of quotients by decreasing sequences of proper ideals
- never really understood the proof of this in Ma5b
math  algebra  characterization  levers  math.AC  wiki  reference  nibble  proofs  additive  arrows
february 2017 by nhaliday
Relationships among probability distributions - Wikipedia
- One distribution is a special case of another with a broader parameter space
- Transforms (function of a random variable);
- Combinations (function of several variables);
- Approximation (limit) relationships;
- Compound relationships (useful for Bayesian inference);
- Duality;
- Conjugate priors.
stats  probability  characterization  list  levers  wiki  reference  objektbuch  calculation  distribution  nibble  cheatsheet  closure  composition-decomposition  properties
february 2017 by nhaliday
bounds - What is the variance of the maximum of a sample? - Cross Validated
- sum of variances is always a bound
- can't do better even for iid Bernoulli
- looks like nice argument from well-known probabilist (using E[(X-Y)^2] = 2Var X), but not clear to me how he gets to sum_i instead of sum_{i,j} in the union bound?
edit: argument is that, for j = argmax_k Y_k, we have r < X_i - Y_j <= X_i - Y_i for all i, including i = argmax_k X_k
- different proof here (later pages): http://www.ism.ac.jp/editsec/aism/pdf/047_1_0185.pdf
Var(X_n:n) <= sum Var(X_k:n) + 2 sum_{i < j} Cov(X_i:n, X_j:n) = Var(sum X_k:n) = Var(sum X_k) = nσ^2
why are the covariances nonnegative? (are they?). intuitively seems true.
- for that, see https://pinboard.in/u:nhaliday/b:ed4466204bb1
- note that this proof shows more generally that sum Var(X_k:n) <= sum Var(X_k)
- apparently that holds for dependent X_k too? http://mathoverflow.net/a/96943/20644
q-n-a  overflow  stats  acm  distribution  tails  bias-variance  moments  estimate  magnitude  probability  iidness  tidbits  concentration-of-measure  multi  orders  levers  extrema  nibble  bonferroni  coarse-fine  expert  symmetry  s:*  expert-experience  proofs
february 2017 by nhaliday
Kolmogorov's zero–one law - Wikipedia
In probability theory, Kolmogorov's zero–one law, named in honor of Andrey Nikolaevich Kolmogorov, specifies that a certain type of event, called a tail event, will either almost surely happen or almost surely not happen; that is, the probability of such an event occurring is zero or one.

tail events include limsup E_i
math  probability  levers  limits  discrete  wiki  reference  nibble
february 2017 by nhaliday
Dvoretzky's theorem - Wikipedia
In mathematics, Dvoretzky's theorem is an important structural theorem about normed vector spaces proved by Aryeh Dvoretzky in the early 1960s, answering a question of Alexander Grothendieck. In essence, it says that every sufficiently high-dimensional normed vector space will have low-dimensional subspaces that are approximately Euclidean. Equivalently, every high-dimensional bounded symmetric convex set has low-dimensional sections that are approximately ellipsoids.

http://mathoverflow.net/questions/143527/intuitive-explanation-of-dvoretzkys-theorem
http://mathoverflow.net/questions/46278/unexpected-applications-of-dvoretzkys-theorem
math  math.FA  inner-product  levers  characterization  geometry  math.MG  concentration-of-measure  multi  q-n-a  overflow  intuition  examples  proofs  dimensionality  gowers  mathtariat  tcstariat  quantum  quantum-info  norms  nibble  high-dimension  wiki  reference  curvature  convexity-curvature  tcs
january 2017 by nhaliday
Wald's equation - Wikipedia
important identity that simplifies the calculation of the expected value of the sum of a random number of random quantities
math  levers  probability  wiki  reference  nibble  expectancy  identity
january 2017 by nhaliday
probability - How to prove Bonferroni inequalities? - Mathematics Stack Exchange
- integrated version of inequalities for alternating sums of (N choose j), where r.v. N = # of events occuring
- inequalities for alternating binomial coefficients follow from general property of unimodal (increasing then decreasing) sequences, which can be gotten w/ two cases for increasing and decreasing resp.
- the final alternating zero sum property follows for binomial coefficients from expanding (1 - 1)^N = 0
- The idea of proving inequality by integrating simpler inequality of r.v.s is nice. Proof from CS 150 was more brute force from what I remember.
q-n-a  overflow  math  probability  tcs  probabilistic-method  estimate  proofs  levers  yoga  multi  tidbits  metabuch  monotonicity  calculation  nibble  bonferroni  tricki  binomial  s:null  elegance
january 2017 by nhaliday
Computational Complexity: Favorite Theorems: The Yao Principle
The Yao Principle applies when we don't consider the algorithmic complexity of the players. For example in communication complexity we have two players who each have a separate half of an input string and they want to compute some function of the input with the minimum amount of communication between them. The Yao principle states that the best probabilistic strategies for the players will achieve exactly the communication bounds as the best deterministic strategy over a worst-case distribution of inputs.

The Yao Principle plays a smaller role where we measure the running time of an algorithm since applying the Principle would require solving an extremely large linear program. But since so many of our bounds are in information-based models like communication and decision-tree complexity, the Yao Principle, though not particularly complicated, plays an important role in lower bounds in a large number of results in our field.
tcstariat  tcs  complexity  adversarial  rand-approx  algorithms  game-theory  yoga  levers  communication-complexity  random  lower-bounds  average-case  nibble  org:bleg
january 2017 by nhaliday
Carathéodory's theorem (convex hull) - Wikipedia
- any convex combination in R^d can be pared down to at most d+1 points
- eg, in R^2 you can always fit a point in convex hull in a triangle
tcs  acm  math.MG  geometry  levers  wiki  reference  optimization  linear-programming  math  linear-algebra  nibble  spatial  curvature  convexity-curvature
january 2017 by nhaliday
per page:    204080120160

Copy this bookmark: