recentpopularlog in

nhaliday : lecture-notes   171

« earlier  
Anti-hash test. - Codeforces
- Thue-Morse sequence
- nice paper: http://www.mii.lt/olympiads_in_informatics/pdf/INFOL119.pdf
In general, polynomial string hashing is a useful technique in construction of efficient string algorithms. One simply needs to remember to carefully select the modulus M and the variable of the polynomial p depending on the application. A good rule of thumb is to pick both values as prime numbers with M as large as possible so that no integer overflow occurs and p being at least the size of the alphabet.
2.2. Upper Bound on M
[stuff about 32- and 64-bit integers]
2.3. Lower Bound on M
On the other side Mis bounded due to the well-known birthday paradox: if we consider a collection of m keys with m ≥ 1.2√M then the chance of a collision to occur within this collection is at least 50% (assuming that the distribution of fingerprints is close to uniform on the set of all strings). Thus if the birthday paradox applies then one needs to choose M=ω(m^2)to have a fair chance to avoid a collision. However, one should note that not always the birthday paradox applies. As a benchmark consider the following two problems.

I generally prefer to use Schwartz-Zippel to reason about collision probabilities w/ this kind of thing, eg, https://people.eecs.berkeley.edu/~sinclair/cs271/n3.pdf.

A good way to get more accurate results: just use multiple primes and the Chinese remainder theorem to get as large an M as you need w/o going beyond 64-bit arithmetic.

more on this: https://codeforces.com/blog/entry/60442
oly  oly-programming  gotchas  howto  hashing  algorithms  strings  random  best-practices  counterexample  multi  pdf  papers  nibble  examples  fields  polynomials  lecture-notes  yoga  probability  estimate  magnitude  hacker  adversarial  CAS  lattice  discrete 
august 2019 by nhaliday
Section 10 Chi-squared goodness-of-fit test.
- pf that chi-squared statistic for Pearson's test (multinomial goodness-of-fit) actually has chi-squared distribution asymptotically
- the gotcha: terms Z_j in sum aren't independent
- solution:
- compute the covariance matrix of the terms to be E[Z_iZ_j] = -sqrt(p_ip_j)
- note that an equivalent way of sampling the Z_j is to take a random standard Gaussian and project onto the plane orthogonal to (sqrt(p_1), sqrt(p_2), ..., sqrt(p_r))
- that is equivalent to just sampling a Gaussian w/ 1 less dimension (hence df=r-1)
QED
pdf  nibble  lecture-notes  mit  stats  hypothesis-testing  acm  probability  methodology  proofs  iidness  distribution  limits  identity  direction  lifts-projections 
october 2017 by nhaliday
Early History of Electricity and Magnetism
The ancient Greeks also knew about magnets. They noted that on rare occasions "lodestones" were found in nature, chunks of iron-rich ore with the puzzling ability to attract iron. Some were discovered near the city of Magnesia (now in Turkey), and from there the words "magnetism" and "magnet" entered the language. The ancient Chinese discovered lodestones independently, and in addition found that after a piece of steel was "touched to a lodestone" it became a magnet itself.'

...

One signpost of the new era was the book "De Magnete" (Latin for "On the Magnet") published in London in 1600 by William Gilbert, a prominent medical doctor and (after 1601) personal physician to Queen Elizabeth I. Gilbert's great interest was in magnets and the strange directional properties of the compass needle, always pointing close to north-south. He correctly traced the reason to the globe of the Earth being itself a giant magnet, and demonstrated his explanation by moving a small compass over the surface of a lodestone trimmed to a sphere (or supplemented to spherical shape by iron attachments?)--a scale model he named "terrella" or "little Earth," on which he was able to duplicate all the directional properties of the compass. (here and here)
nibble  org:edu  org:junk  lecture-notes  history  iron-age  mediterranean  the-classics  physics  electromag  science  the-trenches  the-great-west-whale  discovery  medieval  earth 
september 2017 by nhaliday
Lecture 14: When's that meteor arriving
- Meteors as a random process
- Limiting approximations
- Derivation of the Exponential distribution
- Derivation of the Poisson distribution
- A "Poisson process"
nibble  org:junk  org:edu  exposition  lecture-notes  physics  mechanics  space  earth  probability  stats  distribution  stochastic-processes  closure  additive  limits  approximation  tidbits  acm  binomial  multiplicative 
september 2017 by nhaliday
Recitation 25: Data locality and B-trees
The same idea can be applied to trees. Binary trees are not good for locality because a given node of the binary tree probably occupies only a fraction of a cache line. B-trees are a way to get better locality. As in the hash table trick above, we store several elements in a single node -- as many as will fit in a cache line.

B-trees were originally invented for storing data structures on disk, where locality is even more crucial than with memory. Accessing a disk location takes about 5ms = 5,000,000ns. Therefore if you are storing a tree on disk you want to make sure that a given disk read is as effective as possible. B-trees, with their high branching factor, ensure that few disk reads are needed to navigate to the place where data is stored. B-trees are also useful for in-memory data structures because these days main memory is almost as slow relative to the processor as disk drives were when B-trees were introduced!
nibble  org:junk  org:edu  cornell  lecture-notes  exposition  programming  engineering  systems  dbs  caching  performance  memory-management  os  computer-memory  metal-to-virtual  trees  data-structures  local-global 
september 2017 by nhaliday
Introduction to Scaling Laws
https://betadecay.wordpress.com/2009/10/02/the-physics-of-scaling-laws-and-dimensional-analysis/
http://galileo.phys.virginia.edu/classes/304/scaling.pdf

Galileo’s Discovery of Scaling Laws: https://www.mtholyoke.edu/~mpeterso/classes/galileo/scaling8.pdf
Days 1 and 2 of Two New Sciences

An example of such an insight is “the surface of a small solid is comparatively greater than that of a large one” because the surface goes like the square of a linear dimension, but the volume goes like the cube.5 Thus as one scales down macroscopic objects, forces on their surfaces like viscous drag become relatively more important, and bulk forces like weight become relatively less important. Galileo uses this idea on the First Day in the context of resistance in free fall, as an explanation for why similar objects of different size do not fall exactly together, but the smaller one lags behind.
nibble  org:junk  exposition  lecture-notes  physics  mechanics  street-fighting  problem-solving  scale  magnitude  estimate  fermi  mental-math  calculation  nitty-gritty  multi  scitariat  org:bleg  lens  tutorial  guide  ground-up  tricki  skeleton  list  cheatsheet  identity  levers  hi-order-bits  yoga  metabuch  pdf  article  essay  history  early-modern  europe  the-great-west-whale  science  the-trenches  discovery  fluid  architecture  oceans  giants  tidbits  elegance 
august 2017 by nhaliday
Subgradients - S. Boyd and L. Vandenberghe
If f is convex and x ∈ int dom f, then ∂f(x) is nonempty and bounded. To establish that ∂f(x) ≠ ∅, we apply the supporting hyperplane theorem to the convex set epi f at the boundary point (x, f(x)), ...
pdf  nibble  lecture-notes  acm  optimization  curvature  math.CA  estimate  linearity  differential  existence  proofs  exposition  atoms  math  marginal  convexity-curvature 
august 2017 by nhaliday
Analysis of variance - Wikipedia
Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as "variation" among and between groups), developed by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. ANOVAs are useful for comparing (testing) three or more means (groups or variables) for statistical significance. It is conceptually similar to multiple two-sample t-tests, but is more conservative (results in less type I error) and is therefore suited to a wide range of practical problems.

good pic: https://en.wikipedia.org/wiki/Analysis_of_variance#Motivating_example

tutorial by Gelman: http://www.stat.columbia.edu/~gelman/research/published/econanova3.pdf

so one way to think of partitioning the variance:
y_ij = alpha_i + beta_j + eps_ij
Var(y_ij) = Var(alpha_i) + Var(beta_j) + Cov(alpha_i, beta_j) + Var(eps_ij)
and alpha_i, beta_j are independent, so Cov(alpha_i, beta_j) = 0

can you make this work w/ interaction effects?
data-science  stats  methodology  hypothesis-testing  variance-components  concept  conceptual-vocab  thinking  wiki  reference  nibble  multi  visualization  visual-understanding  pic  pdf  exposition  lecture-notes  gelman  scitariat  tutorial  acm  ground-up  yoga 
july 2017 by nhaliday
Stat 260/CS 294: Bayesian Modeling and Inference
Topics
- Priors (conjugate, noninformative, reference)
- Hierarchical models, spatial models, longitudinal models, dynamic models, survival models
- Testing
- Model choice
- Inference (importance sampling, MCMC, sequential Monte Carlo)
- Nonparametric models (Dirichlet processes, Gaussian processes, neutral-to-the-right processes, completely random measures)
- Decision theory and frequentist perspectives (complete class theorems, consistency, empirical Bayes)
- Experimental design
unit  course  berkeley  expert  michael-jordan  machine-learning  acm  bayesian  probability  stats  lecture-notes  priors-posteriors  markov  monte-carlo  frequentist  latent-variables  decision-theory  expert-experience  confidence  sampling 
july 2017 by nhaliday
Lecture 6: Nash Equilibrum Existence
pf:
- For mixed strategy profile p ∈ Δ(A), let g_ij(p) = gain for player i to switch to pure strategy j.
- Define y: Δ(A) -> Δ(A) by y_ij(p) ∝ p_ij + g_ij(p) (normalizing constant = 1 + ∑_k g_ik(p)).
- Look at fixed point of y.
pdf  nibble  lecture-notes  exposition  acm  game-theory  proofs  math  topology  existence  fixed-point  simplex  equilibrium  ground-up 
june 2017 by nhaliday
More on Multivariate Gaussians
Fact #1: mean and covariance uniquely determine distribution
Fact #3: closure under sum, marginalizing, and conditioning
covariance of conditional distribution is given by a Schur complement (independent of x_B. is that obvious?)
pdf  exposition  lecture-notes  stanford  nibble  distribution  acm  machine-learning  probability  levers  calculation  ground-up  characterization  rigidity  closure  nitty-gritty  linear-algebra  properties 
february 2017 by nhaliday
Hoeffding’s Inequality
basic idea of standard pf: bound e^{tX} by line segment (convexity) then use Taylor expansion (in p = b/(b-a), the fraction of range to right of 0) of logarithm
pdf  lecture-notes  exposition  nibble  concentration-of-measure  estimate  proofs  ground-up  acm  probability  series  s:null 
february 2017 by nhaliday
Equivalence between counting and sampling
also: every counting problem either has FPTRAS or no approx. w/i polynomial factor
pdf  exposition  lecture-notes  berkeley  nibble  tcs  counting  sampling  characterization  complexity  approximation  rand-approx  proofs 
february 2017 by nhaliday
st.statistics - Lower bound for sum of binomial coefficients? - MathOverflow
- basically approximate w/ geometric sum (which scales as final term) and you can get it up to O(1) factor
- not good enough for many applications (want 1+o(1) approx.)
- Stirling can also give bound to constant factor precision w/ more calculation I believe
- tighter bound at Section 7.3 here: http://webbuild.knu.ac.kr/~trj/Combin/matousek-vondrak-prob-ln.pdf
q-n-a  overflow  nibble  math  math.CO  estimate  tidbits  magnitude  concentration-of-measure  stirling  binomial  metabuch  tricki  multi  tightness  pdf  lecture-notes  exposition  probability  probabilistic-method  yoga 
february 2017 by nhaliday
254A, Supplement 4: Probabilistic models and heuristics for the primes (optional) | What's new
among others, the Cramér model for the primes (basically kinda looks like primality is independently distributed w/ Pr[n is prime] = 1/log n)
gowers  mathtariat  lecture-notes  exposition  math  math.NT  probability  heuristic  models  cartoons  nibble  org:bleg  pseudorandomness  borel-cantelli  concentration-of-measure  multiplicative  truth  guessing 
february 2017 by nhaliday
Count–min sketch - Wikipedia
- estimates frequency vector (f_i)
- idea:
d = O(log 1/δ) hash functions h_j: [n] -> [w] (w = O(1/ε))
d*w counters a[r, c]
for each event i, increment counters a[1, h_1(i)], a[2, h_2(i)], ..., a[d, h_d(i)]
estimate for f_i is min_j a[j, h_j(i)]
- never underestimates but upward-biased
- pf: Markov to get constant probability of success, then exponential decrease with repetition
lecture notes: http://theory.stanford.edu/~tim/s15/l/l2.pdf
- note this can work w/ negative updates. just use median instead of min. pf still uses markov on the absolute value of error.
algorithms  data-structures  sublinear  hashing  wiki  reference  bias-variance  approximation  random  tcs  multi  stanford  lecture-notes  pdf  tim-roughgarden  nibble  pigeonhole-markov  PAC 
february 2017 by nhaliday
6.896: Essential Coding Theory
- probabilistic method and Chernoff bound for Shannon coding
- probabilistic method for asymptotically good Hamming codes (Gilbert coding)
- sparsity used for LDPC codes
mit  course  yoga  tcs  complexity  coding-theory  math.AG  fields  polynomials  pigeonhole-markov  linear-algebra  probabilistic-method  lecture-notes  bits  sparsity  concentration-of-measure  linear-programming  linearity  expanders  hamming  pseudorandomness  crypto  rigorous-crypto  communication-complexity  no-go  madhu-sudan  shannon  unit  p:**  quixotic  advanced 
february 2017 by nhaliday
Lecture 11
In which we prove that the Edmonds-Karp algorithm for maximum flow is a strongly polynomial time algorithm, and we begin to talk about the push-relabel approach.
pdf  lecture-notes  exposition  optimization  algorithms  linear-programming  graphs  stanford  luca-trevisan  nibble  direction  stock-flow  tcs  constraint-satisfaction  tcstariat 
january 2017 by nhaliday
Lecture 16
In which we define a multi-commodity flow problem, and we see that its dual is the relaxation of a useful graph partitioning problem. The relaxation can be rounded to yield an approximate graph partitioning algorithm.
pdf  lecture-notes  exposition  optimization  linear-programming  graphs  graph-theory  algorithms  duality  rounding  stanford  approximation  rand-approx  luca-trevisan  relaxation  nibble  stock-flow  constraint-satisfaction  tcs  tcstariat 
january 2017 by nhaliday
« earlier      
per page:    204080120160

Copy this bookmark:





to read