Key trends from NeurIPS 2019

techtariat acmtariat reflection data analysis acm machine-learning conference nips publishing trends the-trenches ubiquity research facebook deep-learning expert-experience cost-benefit speedometer hardware performance convergence generalization off-convex learning-theory explanans roots null-result nibble model-class magnitude approximation bayesian confidence graphs convexity-curvature optimization apollonian-dyonisian neuro pro-rata exploratory flux-stasis increase-decrease generative the-self attention psychology cog-psych interdisciplinary reinforcement bandits online-learning uncertainty search time markov trees differential-privacy privacy matrix-factorization metameta memes(ew) google deepgoog diversity westminster scale replication data-science

5 weeks ago by nhaliday

techtariat acmtariat reflection data analysis acm machine-learning conference nips publishing trends the-trenches ubiquity research facebook deep-learning expert-experience cost-benefit speedometer hardware performance convergence generalization off-convex learning-theory explanans roots null-result nibble model-class magnitude approximation bayesian confidence graphs convexity-curvature optimization apollonian-dyonisian neuro pro-rata exploratory flux-stasis increase-decrease generative the-self attention psychology cog-psych interdisciplinary reinforcement bandits online-learning uncertainty search time markov trees differential-privacy privacy matrix-factorization metameta memes(ew) google deepgoog diversity westminster scale replication data-science

5 weeks ago by nhaliday

Using Artificial Intelligence to Augment Human Intelligence

acmtariat techtariat org:bleg nibble better-explained org:popup michael-nielsen tcstariat worrydream exocortex thinking form-design design hci ai machine-learning deep-learning generative latent-variables things exploratory SIGGRAPH dynamic metameta frontier innovation art skunkworks

6 weeks ago by nhaliday

acmtariat techtariat org:bleg nibble better-explained org:popup michael-nielsen tcstariat worrydream exocortex thinking form-design design hci ai machine-learning deep-learning generative latent-variables things exploratory SIGGRAPH dynamic metameta frontier innovation art skunkworks

6 weeks ago by nhaliday

Mike's Math Page – mikesmathpage

mathtariat blog stream math rec-math teaching tutoring parenting multi news org:sci org:mag popsci interview insurance finance transitions academia audio podcast outcome-risk securities ORFE long-short-run age-generation papers nibble regularizer gotchas

7 weeks ago by nhaliday

mathtariat blog stream math rec-math teaching tutoring parenting multi news org:sci org:mag popsci interview insurance finance transitions academia audio podcast outcome-risk securities ORFE long-short-run age-generation papers nibble regularizer gotchas

7 weeks ago by nhaliday

Simple and efficient semantic embeddings for rare words, n-grams, and language features – Off the convex path

nibble org:bleg acmtariat sanjeev-arora off-convex acm machine-learning deep-learning nlp language structure linear-algebra embeddings research papers summary latent-variables learning-theory existence uniqueness context explanans stochastic-processes models nonlinearity composition-decomposition linearity features roots liner-notes generative common-case

october 2019 by nhaliday

nibble org:bleg acmtariat sanjeev-arora off-convex acm machine-learning deep-learning nlp language structure linear-algebra embeddings research papers summary latent-variables learning-theory existence uniqueness context explanans stochastic-processes models nonlinearity composition-decomposition linearity features roots liner-notes generative common-case

october 2019 by nhaliday

exponential function - Feynman's Trick for Approximating $e^x$ - Mathematics Stack Exchange

october 2019 by nhaliday

1. e^2.3 ~ 10

2. e^.7 ~ 2

3. e^x ~ 1+x

e = 2.71828...

errors (absolute, relative):

1. +0.0258, 0.26%

2. -0.0138, -0.68%

3. 1 + x approximates e^x on [-.3, .3] with absolute error < .05, and relative error < 5.6% (3.7% for [0, .3]).

nibble
q-n-a
overflow
math
feynman
giants
mental-math
calculation
multiplicative
AMT
identity
objektbuch
explanation
howto
estimate
street-fighting
stories
approximation
data
trivia
nitty-gritty
2. e^.7 ~ 2

3. e^x ~ 1+x

e = 2.71828...

errors (absolute, relative):

1. +0.0258, 0.26%

2. -0.0138, -0.68%

3. 1 + x approximates e^x on [-.3, .3] with absolute error < .05, and relative error < 5.6% (3.7% for [0, .3]).

october 2019 by nhaliday

The State of Machine Learning Frameworks [ed.: prev: PyTorch dominates research, Tensorflow dominates industry] | Hacker News

october 2019 by nhaliday

thegradient.pub looks interesting

hn
commentary
techtariat
acmtariat
org:popup
nibble
org:bleg
comparison
deep-learning
libraries
machine-learning
python
software
trends
data-science
sci-comp
tools
google
facebook
tech
working-stiff
best-practices
ecosystem
academia
theory-practice
pragmatic
wire-guided
static-dynamic
state
parsimony
api
flux-stasis
ubiquity
performance
cloud
saas
tech-infrastructure
business
incentives
prediction
frameworks
october 2019 by nhaliday

Is there a common method for detecting the convergence of the Gibbs sampler and the expectation-maximization algorithm? - Quora

october 2019 by nhaliday

In practice and theory it is much easier to diagnose convergence in EM (vanilla or variational) than in any MCMC algorithm (including Gibbs sampling).

https://www.quora.com/How-can-you-determine-if-your-Gibbs-sampler-has-converged

There is a special case when you can actually obtain the stationary distribution, and be sure that you did! If your markov chain consists of a discrete state space, then take the first time that a state repeats in your chain: if you randomly sample an element between the repeating states (but only including one of the endpoints) you will have a sample from your true distribution.

One can achieve this 'exact MCMC sampling' more generally by using the coupling from the past algorithm (Coupling from the past).

Otherwise, there is no rigorous statistical test for convergence. It may be possible to obtain a theoretical bound for the convergence rates: but these are quite difficult to obtain, and quite often too large to be of practical use. For example, even for the simple case of using the Metropolis algorithm for sampling from a two-dimensional uniform distribution, the best convergence rate upper bound achieved, by Persi Diaconis, was something with an astronomical constant factor like 10^300.

In fact, it is fair to say that for most high dimensional problems, we have really no idea whether Gibbs sampling ever comes close to converging, but the best we can do is use some simple diagnostics to detect the most obvious failures.

nibble
q-n-a
qra
acm
stats
probability
limits
convergence
distribution
sampling
markov
monte-carlo
ML-MAP-E
checking
equilibrium
stylized-facts
gelman
levers
mixing
empirical
plots
manifolds
multi
fixed-point
iteration-recursion
heuristic
expert-experience
theory-practice
project
https://www.quora.com/How-can-you-determine-if-your-Gibbs-sampler-has-converged

There is a special case when you can actually obtain the stationary distribution, and be sure that you did! If your markov chain consists of a discrete state space, then take the first time that a state repeats in your chain: if you randomly sample an element between the repeating states (but only including one of the endpoints) you will have a sample from your true distribution.

One can achieve this 'exact MCMC sampling' more generally by using the coupling from the past algorithm (Coupling from the past).

Otherwise, there is no rigorous statistical test for convergence. It may be possible to obtain a theoretical bound for the convergence rates: but these are quite difficult to obtain, and quite often too large to be of practical use. For example, even for the simple case of using the Metropolis algorithm for sampling from a two-dimensional uniform distribution, the best convergence rate upper bound achieved, by Persi Diaconis, was something with an astronomical constant factor like 10^300.

In fact, it is fair to say that for most high dimensional problems, we have really no idea whether Gibbs sampling ever comes close to converging, but the best we can do is use some simple diagnostics to detect the most obvious failures.

october 2019 by nhaliday

online resources - How to write special set notation by hand? - Mathematics Stack Exchange

october 2019 by nhaliday

Your ℕN is “incorrect” in that a capital N in any serif font has the diagonal thickened, not the verticals. In fact, the rule (in Latin alphabet) is that negative slopes are thick, positive ones are thin. Verticals are sometimes thin, sometimes thick. Unique exception: Z. Just look in a newspaper at A, V, X, M, and N.

nibble
q-n-a
overflow
math
writing
notetaking
howto
pic
notation
trivia
october 2019 by nhaliday

Search Combinators

nibble pdf papers cs plt programming search composition-decomposition DSL atoms lexical syntax heuristic abstraction bare-hands iteration-recursion constraint-satisfaction logic systems local-global trees state intricacy coupling-cohesion static-dynamic functional

october 2019 by nhaliday

nibble pdf papers cs plt programming search composition-decomposition DSL atoms lexical syntax heuristic abstraction bare-hands iteration-recursion constraint-satisfaction logic systems local-global trees state intricacy coupling-cohesion static-dynamic functional

october 2019 by nhaliday

When is proof by contradiction necessary? | Gowers's Weblog

nibble org:bleg gowers mathtariat math proofs contradiction volo-avolo structure math.CA math.NT algebra parsimony elegance minimalism efficiency technical-writing necessity-sufficiency degrees-of-freedom simplification-normalization

october 2019 by nhaliday

nibble org:bleg gowers mathtariat math proofs contradiction volo-avolo structure math.CA math.NT algebra parsimony elegance minimalism efficiency technical-writing necessity-sufficiency degrees-of-freedom simplification-normalization

october 2019 by nhaliday

Google AI Blog: Introducing a New Framework for Flexible and Reproducible Reinforcement Learning Research

october 2019 by nhaliday

nice resources for learning RL in HN comments: https://news.ycombinator.com/item?id=19170294

techtariat
org:com
google
acmtariat
deepgoog
org:bleg
nibble
machine-learning
deep-learning
libraries
project
reinforcement
replication
benchmarks
move-fast-(and-break-things)
research
multi
hn
commentary
links
recommendations
init
video
lectures
books
october 2019 by nhaliday

Shuffling - Wikipedia

august 2019 by nhaliday

The Gilbert–Shannon–Reeds model provides a mathematical model of the random outcomes of riffling, that has been shown experimentally to be a good fit to human shuffling[2] and that forms the basis for a recommendation that card decks be riffled seven times in order to randomize them thoroughly.[3] Later, mathematicians Lloyd M. Trefethen and Lloyd N. Trefethen authored a paper using a tweaked version of the Gilbert-Shannon-Reeds model showing that the minimum number of riffles for total randomization could also be 5, if the method of defining randomness is changed.[4][5]

nibble
tidbits
trivia
cocktail
wiki
reference
games
howto
random
models
math
applications
probability
math.CO
mixing
markov
sampling
best-practices
acm
august 2019 by nhaliday

How to fool the "try some test cases" heuristic: Algorithms that appear correct, but are actually incorrect - Computer Science Stack Exchange

nibble q-n-a overflow cs programming correctness error measurement random checking examples counterexample rigor rand-approx greedy debugging math.NT multiplicative algorithms intricacy heuristic methodology

august 2019 by nhaliday

nibble q-n-a overflow cs programming correctness error measurement random checking examples counterexample rigor rand-approx greedy debugging math.NT multiplicative algorithms intricacy heuristic methodology

august 2019 by nhaliday

Anti-hash test. - Codeforces

august 2019 by nhaliday

- Thue-Morse sequence

- nice paper: http://www.mii.lt/olympiads_in_informatics/pdf/INFOL119.pdf

In general, polynomial string hashing is a useful technique in construction of efficient string algorithms. One simply needs to remember to carefully select the modulus M and the variable of the polynomial p depending on the application. A good rule of thumb is to pick both values as prime numbers with M as large as possible so that no integer overflow occurs and p being at least the size of the alphabet.

2.2. Upper Bound on M

[stuff about 32- and 64-bit integers]

2.3. Lower Bound on M

On the other side Mis bounded due to the well-known birthday paradox: if we consider a collection of m keys with m ≥ 1.2√M then the chance of a collision to occur within this collection is at least 50% (assuming that the distribution of fingerprints is close to uniform on the set of all strings). Thus if the birthday paradox applies then one needs to choose M=ω(m^2)to have a fair chance to avoid a collision. However, one should note that not always the birthday paradox applies. As a benchmark consider the following two problems.

I generally prefer to use Schwartz-Zippel to reason about collision probabilities w/ this kind of thing, eg, https://people.eecs.berkeley.edu/~sinclair/cs271/n3.pdf.

A good way to get more accurate results: just use multiple primes and the Chinese remainder theorem to get as large an M as you need w/o going beyond 64-bit arithmetic.

more on this: https://codeforces.com/blog/entry/60442

oly
oly-programming
gotchas
howto
hashing
algorithms
strings
random
best-practices
counterexample
multi
pdf
papers
nibble
examples
fields
polynomials
lecture-notes
yoga
probability
estimate
magnitude
hacker
adversarial
CAS
lattice
discrete
- nice paper: http://www.mii.lt/olympiads_in_informatics/pdf/INFOL119.pdf

In general, polynomial string hashing is a useful technique in construction of efficient string algorithms. One simply needs to remember to carefully select the modulus M and the variable of the polynomial p depending on the application. A good rule of thumb is to pick both values as prime numbers with M as large as possible so that no integer overflow occurs and p being at least the size of the alphabet.

2.2. Upper Bound on M

[stuff about 32- and 64-bit integers]

2.3. Lower Bound on M

On the other side Mis bounded due to the well-known birthday paradox: if we consider a collection of m keys with m ≥ 1.2√M then the chance of a collision to occur within this collection is at least 50% (assuming that the distribution of fingerprints is close to uniform on the set of all strings). Thus if the birthday paradox applies then one needs to choose M=ω(m^2)to have a fair chance to avoid a collision. However, one should note that not always the birthday paradox applies. As a benchmark consider the following two problems.

I generally prefer to use Schwartz-Zippel to reason about collision probabilities w/ this kind of thing, eg, https://people.eecs.berkeley.edu/~sinclair/cs271/n3.pdf.

A good way to get more accurate results: just use multiple primes and the Chinese remainder theorem to get as large an M as you need w/o going beyond 64-bit arithmetic.

more on this: https://codeforces.com/blog/entry/60442

august 2019 by nhaliday

big list - Are there proofs that you feel you did not "understand" for a long time? - MathOverflow

nibble q-n-a overflow soft-question big-list math proofs expert-experience heavyweights gowers mathtariat reflection learning intricacy grokkability intuition algebra math.GR motivation math.GN topology synthesis math.CT computation tcs logic iteration-recursion math.CA extrema smoothness span-cover grokkability-clarity

august 2019 by nhaliday

nibble q-n-a overflow soft-question big-list math proofs expert-experience heavyweights gowers mathtariat reflection learning intricacy grokkability intuition algebra math.GR motivation math.GN topology synthesis math.CT computation tcs logic iteration-recursion math.CA extrema smoothness span-cover grokkability-clarity

august 2019 by nhaliday

Epigrams in Programming | Computer Science

august 2019 by nhaliday

- Alan Perlis

nibble
quotes
aphorism
list
cs
computation
programming
pls
hi-order-bits
synthesis
lens
data-structures
arrows
algorithms
iteration-recursion
intricacy
strings
types
math
formal-methods
pic
visuo
visual-understanding
systems
state
structure
turing
cost-benefit
lisp
performance
software
language
plt
invariance
ends-means
ai
nitty-gritty
sci-comp
composition-decomposition
tradeoffs
grokkability
assembly
internet
egalitarianism-hierarchy
functional
impetus
roots
path-dependence
heavyweights
grokkability-clarity
august 2019 by nhaliday

Notes on Test-Case Reduction | David R. MacIver

nibble techtariat programming libraries analysis system-design design checking random measure parsimony intricacy orders caching arrows structure search bare-hands expert-experience automata-languages concurrency greedy systems pls ideas best-practices summary iteration-recursion trees common-case identification-equivalence nitty-gritty reduction cost-benefit time-complexity performance sequential code-dive examples DSL methodology simplification-normalization

august 2019 by nhaliday

nibble techtariat programming libraries analysis system-design design checking random measure parsimony intricacy orders caching arrows structure search bare-hands expert-experience automata-languages concurrency greedy systems pls ideas best-practices summary iteration-recursion trees common-case identification-equivalence nitty-gritty reduction cost-benefit time-complexity performance sequential code-dive examples DSL methodology simplification-normalization

august 2019 by nhaliday

Modules Matter Most | Existential Type

july 2019 by nhaliday

note comment from gasche (significant OCaml contributor) critiquing modules vs typeclasses: https://existentialtype.wordpress.com/2011/04/16/modules-matter-most/#comment-735

I also think you’re unfair to type classes. You’re right that they are not completely satisfying as a modularity tool, but your presentation make them sound bad in all aspects, which is certainly not true. The limitation of only having one instance per type may be a strong one, but it allows for a level of impliciteness that is just nice. There is a reason why, for example, monads are relatively nice to use in Haskell, while using monads represented as modules in a SML/OCaml programs is a real pain.

It’s a fact that type-classes are widely adopted and used in the Haskell circles, while modules/functors are only used for relatively coarse-gained modularity in the ML community. It should tell you something useful about those two features: they’re something that current modules miss (or maybe a trade-off between flexibility and implicitness that plays against modules for “modularity in the small”), and it’s dishonest and rude to explain the adoption difference by “people don’t know any better”.

nibble
org:bleg
techtariat
programming
pls
plt
ocaml-sml
functional
haskell
types
composition-decomposition
coupling-cohesion
engineering
structure
intricacy
arrows
matching
network-structure
degrees-of-freedom
linearity
nonlinearity
span-cover
direction
multi
poast
expert-experience
blowhards
static-dynamic
protocol-metadata
cmu
I also think you’re unfair to type classes. You’re right that they are not completely satisfying as a modularity tool, but your presentation make them sound bad in all aspects, which is certainly not true. The limitation of only having one instance per type may be a strong one, but it allows for a level of impliciteness that is just nice. There is a reason why, for example, monads are relatively nice to use in Haskell, while using monads represented as modules in a SML/OCaml programs is a real pain.

It’s a fact that type-classes are widely adopted and used in the Haskell circles, while modules/functors are only used for relatively coarse-gained modularity in the ML community. It should tell you something useful about those two features: they’re something that current modules miss (or maybe a trade-off between flexibility and implicitness that plays against modules for “modularity in the small”), and it’s dishonest and rude to explain the adoption difference by “people don’t know any better”.

july 2019 by nhaliday

galois theory - Existence of irreducible polynomial of arbitrary degree over finite field without use of primitive element theorem? - Mathematics Stack Exchange

nibble q-n-a overflow math math.CA algebra multiplicative tidbits proofs existence pigeonhole-markov estimate fields identity measure

july 2019 by nhaliday

nibble q-n-a overflow math math.CA algebra multiplicative tidbits proofs existence pigeonhole-markov estimate fields identity measure

july 2019 by nhaliday

The Reason Why | West Hunter

july 2019 by nhaliday

There are odd things about the orbits of trans-Neptunian objects that suggest ( to some) that there might be an undiscovered super-Earth-sized planet a few hundred AU from the Sun..

We haven’t seen it, but then it would be very hard to see. The underlying reason is simple enough, but I have never seen anyone mention it: the signal from such objects drops as the fourth power of distance from the Sun. Not the second power, as is the case with luminous objects like stars, or faraway objects that are close to a star. We can image close-in planets of other stars that are light-years distant, but it’s very difficult to see a fair-sized planet a few hundred AU out.

--

interesting little fun fact

west-hunter
scitariat
nibble
tidbits
scale
magnitude
visuo
electromag
spatial
space
measurement
paradox
physics
We haven’t seen it, but then it would be very hard to see. The underlying reason is simple enough, but I have never seen anyone mention it: the signal from such objects drops as the fourth power of distance from the Sun. Not the second power, as is the case with luminous objects like stars, or faraway objects that are close to a star. We can image close-in planets of other stars that are light-years distant, but it’s very difficult to see a fair-sized planet a few hundred AU out.

--

interesting little fun fact

july 2019 by nhaliday

Zero-based numbering - Wikipedia

july 2019 by nhaliday

https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html

https://softwareengineering.stackexchange.com/questions/110804/why-are-zero-based-arrays-the-norm

idk about this guy's competence. he seems to want to sensationalize the historical reasons while ignoring or being ignorant of the legitimate mathematical reasons:

http://exple.tive.org/blarg/2013/10/22/citation-needed/

nibble
concept
wiki
reference
programming
pls
c(pp)
systems
plt
roots
explanans
degrees-of-freedom
data-structures
multi
hmm
debate
critique
ideas
techtariat
rant
q-n-a
stackex
compilers
performance
calculation
correctness
mental-math
parsimony
whole-partial-many
tradition
worse-is-better/the-right-thing
mit
business
rhetoric
worrydream
flux-stasis
legacy
elegance
stories
blowhards
idk
org:junk
intricacy
measure
debugging
best-practices
notation
heavyweights
protocol-metadata
https://softwareengineering.stackexchange.com/questions/110804/why-are-zero-based-arrays-the-norm

idk about this guy's competence. he seems to want to sensationalize the historical reasons while ignoring or being ignorant of the legitimate mathematical reasons:

http://exple.tive.org/blarg/2013/10/22/citation-needed/

july 2019 by nhaliday

Rational Sines of Rational Multiples of p

july 2019 by nhaliday

For which rational multiples of p is the sine rational? We have the three trivial cases

[0, pi/2, pi/6]

and we wish to show that these are essentially the only distinct rational sines of rational multiples of p.

The assertion about rational sines of rational multiples of p follows from two fundamental lemmas. The first is

Lemma 1: For any rational number q the value of sin(qp) is a root of a monic polynomial with integer coefficients.

[Pf uses some ideas unfamiliar to me: similarity parameter of Moebius (linear fraction) transformations, and finding a polynomial for a desired root by constructing a Moebius transformation with a finite period.]

...

Lemma 2: Any root of a monic polynomial f(x) with integer coefficients must either be an integer or irrational.

[Gauss's Lemma, cf Dummit-Foote.]

...

nibble
tidbits
org:junk
analysis
trivia
math
algebra
polynomials
fields
characterization
direction
math.CA
math.CV
ground-up
[0, pi/2, pi/6]

and we wish to show that these are essentially the only distinct rational sines of rational multiples of p.

The assertion about rational sines of rational multiples of p follows from two fundamental lemmas. The first is

Lemma 1: For any rational number q the value of sin(qp) is a root of a monic polynomial with integer coefficients.

[Pf uses some ideas unfamiliar to me: similarity parameter of Moebius (linear fraction) transformations, and finding a polynomial for a desired root by constructing a Moebius transformation with a finite period.]

...

Lemma 2: Any root of a monic polynomial f(x) with integer coefficients must either be an integer or irrational.

[Gauss's Lemma, cf Dummit-Foote.]

...

july 2019 by nhaliday

The Existential Risk of Math Errors - Gwern.net

july 2019 by nhaliday

How big is this upper bound? Mathematicians have often made errors in proofs. But it’s rarer for ideas to be accepted for a long time and then rejected. But we can divide errors into 2 basic cases corresponding to type I and type II errors:

1. Mistakes where the theorem is still true, but the proof was incorrect (type I)

2. Mistakes where the theorem was false, and the proof was also necessarily incorrect (type II)

Before someone comes up with a final answer, a mathematician may have many levels of intuition in formulating & working on the problem, but we’ll consider the final end-product where the mathematician feels satisfied that he has solved it. Case 1 is perhaps the most common case, with innumerable examples; this is sometimes due to mistakes in the proof that anyone would accept is a mistake, but many of these cases are due to changing standards of proof. For example, when David Hilbert discovered errors in Euclid’s proofs which no one noticed before, the theorems were still true, and the gaps more due to Hilbert being a modern mathematician thinking in terms of formal systems (which of course Euclid did not think in). (David Hilbert himself turns out to be a useful example of the other kind of error: his famous list of 23 problems was accompanied by definite opinions on the outcome of each problem and sometimes timings, several of which were wrong or questionable5.) Similarly, early calculus used ‘infinitesimals’ which were sometimes treated as being 0 and sometimes treated as an indefinitely small non-zero number; this was incoherent and strictly speaking, practically all of the calculus results were wrong because they relied on an incoherent concept - but of course the results were some of the greatest mathematical work ever conducted6 and when later mathematicians put calculus on a more rigorous footing, they immediately re-derived those results (sometimes with important qualifications), and doubtless as modern math evolves other fields have sometimes needed to go back and clean up the foundations and will in the future.7

...

Isaac Newton, incidentally, gave two proofs of the same solution to a problem in probability, one via enumeration and the other more abstract; the enumeration was correct, but the other proof totally wrong and this was not noticed for a long time, leading Stigler to remark:

...

TYPE I > TYPE II?

“Lefschetz was a purely intuitive mathematician. It was said of him that he had never given a completely correct proof, but had never made a wrong guess either.”

- Gian-Carlo Rota13

Case 2 is disturbing, since it is a case in which we wind up with false beliefs and also false beliefs about our beliefs (we no longer know that we don’t know). Case 2 could lead to extinction.

...

Except, errors do not seem to be evenly & randomly distributed between case 1 and case 2. There seem to be far more case 1s than case 2s, as already mentioned in the early calculus example: far more than 50% of the early calculus results were correct when checked more rigorously. Richard Hamming attributes to Ralph Boas a comment that while editing Mathematical Reviews that “of the new results in the papers reviewed most are true but the corresponding proofs are perhaps half the time plain wrong”.

...

Gian-Carlo Rota gives us an example with Hilbert:

...

Olga labored for three years; it turned out that all mistakes could be corrected without any major changes in the statement of the theorems. There was one exception, a paper Hilbert wrote in his old age, which could not be fixed; it was a purported proof of the continuum hypothesis, you will find it in a volume of the Mathematische Annalen of the early thirties.

...

Leslie Lamport advocates for machine-checked proofs and a more rigorous style of proofs similar to natural deduction, noting a mathematician acquaintance guesses at a broad error rate of 1/329 and that he routinely found mistakes in his own proofs and, worse, believed false conjectures30.

[more on these "structured proofs":

https://academia.stackexchange.com/questions/52435/does-anyone-actually-publish-structured-proofs

https://mathoverflow.net/questions/35727/community-experiences-writing-lamports-structured-proofs

]

We can probably add software to that list: early software engineering work found that, dismayingly, bug rates seem to be simply a function of lines of code, and one would expect diseconomies of scale. So one would expect that in going from the ~4,000 lines of code of the Microsoft DOS operating system kernel to the ~50,000,000 lines of code in Windows Server 2003 (with full systems of applications and libraries being even larger: the comprehensive Debian repository in 2007 contained ~323,551,126 lines of code) that the number of active bugs at any time would be… fairly large. Mathematical software is hopefully better, but practitioners still run into issues (eg Durán et al 2014, Fonseca et al 2017) and I don’t know of any research pinning down how buggy key mathematical systems like Mathematica are or how much published mathematics may be erroneous due to bugs. This general problem led to predictions of doom and spurred much research into automated proof-checking, static analysis, and functional languages31.

[related:

https://mathoverflow.net/questions/11517/computer-algebra-errors

I don't know any interesting bugs in symbolic algebra packages but I know a true, enlightening and entertaining story about something that looked like a bug but wasn't.

Define sinc𝑥=(sin𝑥)/𝑥.

Someone found the following result in an algebra package: ∫∞0𝑑𝑥sinc𝑥=𝜋/2

They then found the following results:

...

So of course when they got:

∫∞0𝑑𝑥sinc𝑥sinc(𝑥/3)sinc(𝑥/5)⋯sinc(𝑥/15)=(467807924713440738696537864469/935615849440640907310521750000)𝜋

hmm:

Which means that nobody knows Fourier analysis nowdays. Very sad and discouraging story... – fedja Jan 29 '10 at 18:47

--

Because the most popular systems are all commercial, they tend to guard their bug database rather closely -- making them public would seriously cut their sales. For example, for the open source project Sage (which is quite young), you can get a list of all the known bugs from this page. 1582 known issues on Feb.16th 2010 (which includes feature requests, problems with documentation, etc).

That is an order of magnitude less than the commercial systems. And it's not because it is better, it is because it is younger and smaller. It might be better, but until SAGE does a lot of analysis (about 40% of CAS bugs are there) and a fancy user interface (another 40%), it is too hard to compare.

I once ran a graduate course whose core topic was studying the fundamental disconnect between the algebraic nature of CAS and the analytic nature of the what it is mostly used for. There are issues of logic -- CASes work more or less in an intensional logic, while most of analysis is stated in a purely extensional fashion. There is no well-defined 'denotational semantics' for expressions-as-functions, which strongly contributes to the deeper bugs in CASes.]

...

Should such widely-believed conjectures as P≠NP or the Riemann hypothesis turn out be false, then because they are assumed by so many existing proofs, a far larger math holocaust would ensue38 - and our previous estimates of error rates will turn out to have been substantial underestimates. But it may be a cloud with a silver lining, if it doesn’t come at a time of danger.

https://mathoverflow.net/questions/338607/why-doesnt-mathematics-collapse-down-even-though-humans-quite-often-make-mista

more on formal methods in programming:

https://www.quantamagazine.org/formal-verification-creates-hacker-proof-code-20160920/

https://intelligence.org/2014/03/02/bob-constable/

https://softwareengineering.stackexchange.com/questions/375342/what-are-the-barriers-that-prevent-widespread-adoption-of-formal-methods

Update: measured effort

In the October 2018 issue of Communications of the ACM there is an interesting article about Formally verified software in the real world with some estimates of the effort.

Interestingly (based on OS development for military equipment), it seems that producing formally proved software requires 3.3 times more effort than with traditional engineering techniques. So it's really costly.

On the other hand, it requires 2.3 times less effort to get high security software this way than with traditionally engineered software if you add the effort to make such software certified at a high security level (EAL 7). So if you have high reliability or security requirements there is definitively a business case for going formal.

WHY DON'T PEOPLE USE FORMAL METHODS?: https://www.hillelwayne.com/post/why-dont-people-use-formal-methods/

You can see examples of how all of these look at Let’s Prove Leftpad. HOL4 and Isabelle are good examples of “independent theorem” specs, SPARK and Dafny have “embedded assertion” specs, and Coq and Agda have “dependent type” specs.6

If you squint a bit it looks like these three forms of code spec map to the three main domains of automated correctness checking: tests, contracts, and types. This is not a coincidence. Correctness is a spectrum, and formal verification is one extreme of that spectrum. As we reduce the rigour (and effort) of our verification we get simpler and narrower checks, whether that means limiting the explored state space, using weaker types, or pushing verification to the runtime. Any means of total specification then becomes a means of partial specification, and vice versa: many consider Cleanroom a formal verification technique, which primarily works by pushing code review far beyond what’s humanly possible.

...

The question, then: “is 90/95/99% correct significantly cheaper than 100% correct?” The answer is very yes. We all are comfortable saying that a codebase we’ve well-tested and well-typed is mostly correct modulo a few fixes in prod, and we’re even writing more than four lines of code a day. In fact, the vast… [more]

ratty
gwern
analysis
essay
realness
truth
correctness
reason
philosophy
math
proofs
formal-methods
cs
programming
engineering
worse-is-better/the-right-thing
intuition
giants
old-anglo
error
street-fighting
heuristic
zooming
risk
threat-modeling
software
lens
logic
inference
physics
differential
geometry
estimate
distribution
robust
speculation
nonlinearity
cost-benefit
convexity-curvature
measure
scale
trivia
cocktail
history
early-modern
europe
math.CA
rigor
news
org:mag
org:sci
miri-cfar
pdf
thesis
comparison
examples
org:junk
q-n-a
stackex
pragmatic
tradeoffs
cracker-prog
techtariat
invariance
DSL
chart
ecosystem
grokkability
heavyweights
CAS
static-dynamic
lower-bounds
complexity
tcs
open-problems
big-surf
ideas
certificates-recognition
proof-systems
PCP
mediterranean
SDP
meta:prediction
epistemic
questions
guessing
distributed
overflow
nibble
soft-question
track-record
big-list
hmm
frontier
state-of-art
move-fast-(and-break-things)
grokkability-clarity
technical-writing
trust
1. Mistakes where the theorem is still true, but the proof was incorrect (type I)

2. Mistakes where the theorem was false, and the proof was also necessarily incorrect (type II)

Before someone comes up with a final answer, a mathematician may have many levels of intuition in formulating & working on the problem, but we’ll consider the final end-product where the mathematician feels satisfied that he has solved it. Case 1 is perhaps the most common case, with innumerable examples; this is sometimes due to mistakes in the proof that anyone would accept is a mistake, but many of these cases are due to changing standards of proof. For example, when David Hilbert discovered errors in Euclid’s proofs which no one noticed before, the theorems were still true, and the gaps more due to Hilbert being a modern mathematician thinking in terms of formal systems (which of course Euclid did not think in). (David Hilbert himself turns out to be a useful example of the other kind of error: his famous list of 23 problems was accompanied by definite opinions on the outcome of each problem and sometimes timings, several of which were wrong or questionable5.) Similarly, early calculus used ‘infinitesimals’ which were sometimes treated as being 0 and sometimes treated as an indefinitely small non-zero number; this was incoherent and strictly speaking, practically all of the calculus results were wrong because they relied on an incoherent concept - but of course the results were some of the greatest mathematical work ever conducted6 and when later mathematicians put calculus on a more rigorous footing, they immediately re-derived those results (sometimes with important qualifications), and doubtless as modern math evolves other fields have sometimes needed to go back and clean up the foundations and will in the future.7

...

Isaac Newton, incidentally, gave two proofs of the same solution to a problem in probability, one via enumeration and the other more abstract; the enumeration was correct, but the other proof totally wrong and this was not noticed for a long time, leading Stigler to remark:

...

TYPE I > TYPE II?

“Lefschetz was a purely intuitive mathematician. It was said of him that he had never given a completely correct proof, but had never made a wrong guess either.”

- Gian-Carlo Rota13

Case 2 is disturbing, since it is a case in which we wind up with false beliefs and also false beliefs about our beliefs (we no longer know that we don’t know). Case 2 could lead to extinction.

...

Except, errors do not seem to be evenly & randomly distributed between case 1 and case 2. There seem to be far more case 1s than case 2s, as already mentioned in the early calculus example: far more than 50% of the early calculus results were correct when checked more rigorously. Richard Hamming attributes to Ralph Boas a comment that while editing Mathematical Reviews that “of the new results in the papers reviewed most are true but the corresponding proofs are perhaps half the time plain wrong”.

...

Gian-Carlo Rota gives us an example with Hilbert:

...

Olga labored for three years; it turned out that all mistakes could be corrected without any major changes in the statement of the theorems. There was one exception, a paper Hilbert wrote in his old age, which could not be fixed; it was a purported proof of the continuum hypothesis, you will find it in a volume of the Mathematische Annalen of the early thirties.

...

Leslie Lamport advocates for machine-checked proofs and a more rigorous style of proofs similar to natural deduction, noting a mathematician acquaintance guesses at a broad error rate of 1/329 and that he routinely found mistakes in his own proofs and, worse, believed false conjectures30.

[more on these "structured proofs":

https://academia.stackexchange.com/questions/52435/does-anyone-actually-publish-structured-proofs

https://mathoverflow.net/questions/35727/community-experiences-writing-lamports-structured-proofs

]

We can probably add software to that list: early software engineering work found that, dismayingly, bug rates seem to be simply a function of lines of code, and one would expect diseconomies of scale. So one would expect that in going from the ~4,000 lines of code of the Microsoft DOS operating system kernel to the ~50,000,000 lines of code in Windows Server 2003 (with full systems of applications and libraries being even larger: the comprehensive Debian repository in 2007 contained ~323,551,126 lines of code) that the number of active bugs at any time would be… fairly large. Mathematical software is hopefully better, but practitioners still run into issues (eg Durán et al 2014, Fonseca et al 2017) and I don’t know of any research pinning down how buggy key mathematical systems like Mathematica are or how much published mathematics may be erroneous due to bugs. This general problem led to predictions of doom and spurred much research into automated proof-checking, static analysis, and functional languages31.

[related:

https://mathoverflow.net/questions/11517/computer-algebra-errors

I don't know any interesting bugs in symbolic algebra packages but I know a true, enlightening and entertaining story about something that looked like a bug but wasn't.

Define sinc𝑥=(sin𝑥)/𝑥.

Someone found the following result in an algebra package: ∫∞0𝑑𝑥sinc𝑥=𝜋/2

They then found the following results:

...

So of course when they got:

∫∞0𝑑𝑥sinc𝑥sinc(𝑥/3)sinc(𝑥/5)⋯sinc(𝑥/15)=(467807924713440738696537864469/935615849440640907310521750000)𝜋

hmm:

Which means that nobody knows Fourier analysis nowdays. Very sad and discouraging story... – fedja Jan 29 '10 at 18:47

--

Because the most popular systems are all commercial, they tend to guard their bug database rather closely -- making them public would seriously cut their sales. For example, for the open source project Sage (which is quite young), you can get a list of all the known bugs from this page. 1582 known issues on Feb.16th 2010 (which includes feature requests, problems with documentation, etc).

That is an order of magnitude less than the commercial systems. And it's not because it is better, it is because it is younger and smaller. It might be better, but until SAGE does a lot of analysis (about 40% of CAS bugs are there) and a fancy user interface (another 40%), it is too hard to compare.

I once ran a graduate course whose core topic was studying the fundamental disconnect between the algebraic nature of CAS and the analytic nature of the what it is mostly used for. There are issues of logic -- CASes work more or less in an intensional logic, while most of analysis is stated in a purely extensional fashion. There is no well-defined 'denotational semantics' for expressions-as-functions, which strongly contributes to the deeper bugs in CASes.]

...

Should such widely-believed conjectures as P≠NP or the Riemann hypothesis turn out be false, then because they are assumed by so many existing proofs, a far larger math holocaust would ensue38 - and our previous estimates of error rates will turn out to have been substantial underestimates. But it may be a cloud with a silver lining, if it doesn’t come at a time of danger.

https://mathoverflow.net/questions/338607/why-doesnt-mathematics-collapse-down-even-though-humans-quite-often-make-mista

more on formal methods in programming:

https://www.quantamagazine.org/formal-verification-creates-hacker-proof-code-20160920/

https://intelligence.org/2014/03/02/bob-constable/

https://softwareengineering.stackexchange.com/questions/375342/what-are-the-barriers-that-prevent-widespread-adoption-of-formal-methods

Update: measured effort

In the October 2018 issue of Communications of the ACM there is an interesting article about Formally verified software in the real world with some estimates of the effort.

Interestingly (based on OS development for military equipment), it seems that producing formally proved software requires 3.3 times more effort than with traditional engineering techniques. So it's really costly.

On the other hand, it requires 2.3 times less effort to get high security software this way than with traditionally engineered software if you add the effort to make such software certified at a high security level (EAL 7). So if you have high reliability or security requirements there is definitively a business case for going formal.

WHY DON'T PEOPLE USE FORMAL METHODS?: https://www.hillelwayne.com/post/why-dont-people-use-formal-methods/

You can see examples of how all of these look at Let’s Prove Leftpad. HOL4 and Isabelle are good examples of “independent theorem” specs, SPARK and Dafny have “embedded assertion” specs, and Coq and Agda have “dependent type” specs.6

If you squint a bit it looks like these three forms of code spec map to the three main domains of automated correctness checking: tests, contracts, and types. This is not a coincidence. Correctness is a spectrum, and formal verification is one extreme of that spectrum. As we reduce the rigour (and effort) of our verification we get simpler and narrower checks, whether that means limiting the explored state space, using weaker types, or pushing verification to the runtime. Any means of total specification then becomes a means of partial specification, and vice versa: many consider Cleanroom a formal verification technique, which primarily works by pushing code review far beyond what’s humanly possible.

...

The question, then: “is 90/95/99% correct significantly cheaper than 100% correct?” The answer is very yes. We all are comfortable saying that a codebase we’ve well-tested and well-typed is mostly correct modulo a few fixes in prod, and we’re even writing more than four lines of code a day. In fact, the vast… [more]

july 2019 by nhaliday

Factorization of polynomials over finite fields - Wikipedia

july 2019 by nhaliday

In mathematics and computer algebra the factorization of a polynomial consists of decomposing it into a product of irreducible factors. This decomposition is theoretically possible and is unique for polynomials with coefficients in any field, but rather strong restrictions on the field of the coefficients are needed to allow the computation of the factorization by means of an algorithm. In practice, algorithms have been designed only for polynomials with coefficients in a finite field, in the field of rationals or in a finitely generated field extension of one of them.

All factorization algorithms, including the case of multivariate polynomials over the rational numbers, reduce the problem to this case; see polynomial factorization. It is also used for various applications of finite fields, such as coding theory (cyclic redundancy codes and BCH codes), cryptography (public key cryptography by the means of elliptic curves), and computational number theory.

As the reduction of the factorization of multivariate polynomials to that of univariate polynomials does not have any specificity in the case of coefficients in a finite field, only polynomials with one variable are considered in this article.

...

In the algorithms that follow, the complexities are expressed in terms of number of arithmetic operations in Fq, using classical algorithms for the arithmetic of polynomials.

[ed.: Interesting choice...]

...

Factoring algorithms

Many algorithms for factoring polynomials over finite fields include the following three stages:

Square-free factorization

Distinct-degree factorization

Equal-degree factorization

An important exception is Berlekamp's algorithm, which combines stages 2 and 3.

Berlekamp's algorithm

Main article: Berlekamp's algorithm

The Berlekamp's algorithm is historically important as being the first factorization algorithm, which works well in practice. However, it contains a loop on the elements of the ground field, which implies that it is practicable only over small finite fields. For a fixed ground field, its time complexity is polynomial, but, for general ground fields, the complexity is exponential in the size of the ground field.

[ed.: This actually looks fairly implementable.]

wiki
reference
concept
algorithms
calculation
nibble
numerics
math
algebra
math.CA
fields
polynomials
levers
multiplicative
math.NT
All factorization algorithms, including the case of multivariate polynomials over the rational numbers, reduce the problem to this case; see polynomial factorization. It is also used for various applications of finite fields, such as coding theory (cyclic redundancy codes and BCH codes), cryptography (public key cryptography by the means of elliptic curves), and computational number theory.

As the reduction of the factorization of multivariate polynomials to that of univariate polynomials does not have any specificity in the case of coefficients in a finite field, only polynomials with one variable are considered in this article.

...

In the algorithms that follow, the complexities are expressed in terms of number of arithmetic operations in Fq, using classical algorithms for the arithmetic of polynomials.

[ed.: Interesting choice...]

...

Factoring algorithms

Many algorithms for factoring polynomials over finite fields include the following three stages:

Square-free factorization

Distinct-degree factorization

Equal-degree factorization

An important exception is Berlekamp's algorithm, which combines stages 2 and 3.

Berlekamp's algorithm

Main article: Berlekamp's algorithm

The Berlekamp's algorithm is historically important as being the first factorization algorithm, which works well in practice. However, it contains a loop on the elements of the ground field, which implies that it is practicable only over small finite fields. For a fixed ground field, its time complexity is polynomial, but, for general ground fields, the complexity is exponential in the size of the ground field.

[ed.: This actually looks fairly implementable.]

july 2019 by nhaliday

Another Go at Language Design - Rob Pike, 2010

nibble pdf slides stanford google programming pls nitty-gritty engineering idk golang intricacy parsimony python haskell jvm scala comparison c(pp) systems reflection roots degrees-of-freedom correlation types compilers time devtools scaling-tech cost-benefit tradeoffs networking concurrency span-cover wire-guided data measure oop computer-memory safety correctness web stories git vcs abstraction ideas summary numerics cracker-prog rsc build-packaging scale direct-indirect performance latency-throughput techtariat

june 2019 by nhaliday

nibble pdf slides stanford google programming pls nitty-gritty engineering idk golang intricacy parsimony python haskell jvm scala comparison c(pp) systems reflection roots degrees-of-freedom correlation types compilers time devtools scaling-tech cost-benefit tradeoffs networking concurrency span-cover wire-guided data measure oop computer-memory safety correctness web stories git vcs abstraction ideas summary numerics cracker-prog rsc build-packaging scale direct-indirect performance latency-throughput techtariat

june 2019 by nhaliday

c++ - Which is faster: Stack allocation or Heap allocation - Stack Overflow

june 2019 by nhaliday

On my machine, using g++ 3.4.4 on Windows, I get "0 clock ticks" for both stack and heap allocation for anything less than 100000 allocations, and even then I get "0 clock ticks" for stack allocation and "15 clock ticks" for heap allocation. When I measure 10,000,000 allocations, stack allocation takes 31 clock ticks and heap allocation takes 1562 clock ticks.

so maybe around 100x difference? what does that work out to in terms of total workload?

hmm:

http://vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/asplos17mallacc.pdf

Recent work shows that dynamic memory allocation consumes nearly 7% of all cycles in Google datacenters.

That's not too bad actually. Seems like I shouldn't worry about shifting from heap to stack/globals unless profiling says it's important, particularly for non-oly stuff.

edit: Actually, factor x100 for 7% is pretty high, could be increase constant factor by almost an order of magnitude.

edit: Well actually that's not the right math. 93% + 7%*.01 is not much smaller than 100%

q-n-a
stackex
programming
c(pp)
systems
memory-management
performance
intricacy
comparison
benchmarks
data
objektbuch
empirical
google
papers
nibble
time
measure
pro-rata
distribution
multi
pdf
oly-programming
computer-memory
so maybe around 100x difference? what does that work out to in terms of total workload?

hmm:

http://vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/asplos17mallacc.pdf

Recent work shows that dynamic memory allocation consumes nearly 7% of all cycles in Google datacenters.

That's not too bad actually. Seems like I shouldn't worry about shifting from heap to stack/globals unless profiling says it's important, particularly for non-oly stuff.

edit: Actually, factor x100 for 7% is pretty high, could be increase constant factor by almost an order of magnitude.

edit: Well actually that's not the right math. 93% + 7%*.01 is not much smaller than 100%

june 2019 by nhaliday

data structures - Why are Red-Black trees so popular? - Computer Science Stack Exchange

june 2019 by nhaliday

- AVL trees have smaller average depth than red-black trees, and thus searching for a value in AVL tree is consistently faster.

- Red-black trees make less structural changes to balance themselves than AVL trees, which could make them potentially faster for insert/delete. I'm saying potentially, because this would depend on the cost of the structural change to the tree, as this will depend a lot on the runtime and implemntation (might also be completely different in a functional language when the tree is immutable?)

There are many benchmarks online that compare AVL and Red-black trees, but what struck me is that my professor basically said, that usually you'd do one of two things:

- Either you don't really care that much about performance, in which case the 10-20% difference of AVL vs Red-black in most cases won't matter at all.

- Or you really care about performance, in which you case you'd ditch both AVL and Red-black trees, and go with B-trees, which can be tweaked to work much better (or (a,b)-trees, I'm gonna put all of those in one basket.)

--

> For some kinds of binary search trees, including red-black trees but not AVL trees, the "fixes" to the tree can fairly easily be predicted on the way down and performed during a single top-down pass, making the second pass unnecessary. Such insertion algorithms are typically implemented with a loop rather than recursion, and often run slightly faster in practice than their two-pass counterparts.

So a RedBlack tree insert can be implemented without recursion, on some CPUs recursion is very expensive if you overrun the function call cache (e.g SPARC due to is use of Register window)

--

There are some cases where you can't use B-trees at all.

One prominent case is std::map from C++ STL. The standard requires that insert does not invalidate existing iterators

...

I also believe that "single pass tail recursive" implementation is not the reason for red black tree popularity as a mutable data structure.

First of all, stack depth is irrelevant here, because (given log𝑛 height) you would run out of the main memory before you run out of stack space. Jemalloc is happy with preallocating worst case depth on the stack.

nibble
q-n-a
overflow
cs
algorithms
tcs
data-structures
functional
orders
trees
cost-benefit
tradeoffs
roots
explanans
impetus
performance
applicability-prereqs
programming
pls
c(pp)
ubiquity
- Red-black trees make less structural changes to balance themselves than AVL trees, which could make them potentially faster for insert/delete. I'm saying potentially, because this would depend on the cost of the structural change to the tree, as this will depend a lot on the runtime and implemntation (might also be completely different in a functional language when the tree is immutable?)

There are many benchmarks online that compare AVL and Red-black trees, but what struck me is that my professor basically said, that usually you'd do one of two things:

- Either you don't really care that much about performance, in which case the 10-20% difference of AVL vs Red-black in most cases won't matter at all.

- Or you really care about performance, in which you case you'd ditch both AVL and Red-black trees, and go with B-trees, which can be tweaked to work much better (or (a,b)-trees, I'm gonna put all of those in one basket.)

--

> For some kinds of binary search trees, including red-black trees but not AVL trees, the "fixes" to the tree can fairly easily be predicted on the way down and performed during a single top-down pass, making the second pass unnecessary. Such insertion algorithms are typically implemented with a loop rather than recursion, and often run slightly faster in practice than their two-pass counterparts.

So a RedBlack tree insert can be implemented without recursion, on some CPUs recursion is very expensive if you overrun the function call cache (e.g SPARC due to is use of Register window)

--

There are some cases where you can't use B-trees at all.

One prominent case is std::map from C++ STL. The standard requires that insert does not invalidate existing iterators

...

I also believe that "single pass tail recursive" implementation is not the reason for red black tree popularity as a mutable data structure.

First of all, stack depth is irrelevant here, because (given log𝑛 height) you would run out of the main memory before you run out of stack space. Jemalloc is happy with preallocating worst case depth on the stack.

june 2019 by nhaliday

Interview with Donald Knuth | Interview with Donald Knuth | InformIT

june 2019 by nhaliday

Andrew Binstock and Donald Knuth converse on the success of open source, the problem with multicore architecture, the disappointing lack of interest in literate programming, the menace of reusable code, and that urban legend about winning a programming contest with a single compilation.

Reusable vs. re-editable code: https://hal.archives-ouvertes.fr/hal-01966146/document

- Konrad Hinsen

https://www.johndcook.com/blog/2008/05/03/reusable-code-vs-re-editable-code/

I think whether code should be editable or in “an untouchable black box” depends on the number of developers involved, as well as their talent and motivation. Knuth is a highly motivated genius working in isolation. Most software is developed by large teams of programmers with varying degrees of motivation and talent. I think the further you move away from Knuth along these three axes the more important black boxes become.

nibble
interview
giants
expert-experience
programming
cs
software
contrarianism
carmack
oss
prediction
trends
linux
concurrency
desktop
comparison
checking
debugging
stories
engineering
hmm
idk
algorithms
books
debate
flux-stasis
duplication
parsimony
best-practices
writing
documentation
latex
intricacy
structure
hardware
caching
workflow
editors
composition-decomposition
coupling-cohesion
exposition
technical-writing
thinking
cracker-prog
code-organizing
grokkability
multi
techtariat
commentary
pdf
reflection
essay
examples
python
data-science
libraries
grokkability-clarity
project-management
Reusable vs. re-editable code: https://hal.archives-ouvertes.fr/hal-01966146/document

- Konrad Hinsen

https://www.johndcook.com/blog/2008/05/03/reusable-code-vs-re-editable-code/

I think whether code should be editable or in “an untouchable black box” depends on the number of developers involved, as well as their talent and motivation. Knuth is a highly motivated genius working in isolation. Most software is developed by large teams of programmers with varying degrees of motivation and talent. I think the further you move away from Knuth along these three axes the more important black boxes become.

june 2019 by nhaliday

Programming Languages - Hyperpolyglot

june 2019 by nhaliday

very detailed PL comparisons/cheatsheets, also CASes, sci-comp stuff, SQLs, and programmer tools

tools
reference
cheatsheet
comparison
programming
pls
python
javascript
howto
list
terminal
c(pp)
golang
jvm
rust
scala
functional
haskell
ocaml-sml
lisp
numerics
sci-comp
data-science
r-lang
CAS
nibble
tutorial
init
documentation
editors
vcs
git
hg
dbs
types
oop
syntax
linear-algebra
math
math.CA
differential
math.CO
math.NT
plots
dataviz
polynomials
unix
objektbuch
crosstab
track-record
dotnet
DSL
whole-partial-many
static-dynamic
error-handling
error
june 2019 by nhaliday

Bareiss algorithm - Wikipedia

june 2019 by nhaliday

During the execution of Bareiss algorithm, every integer that is computed is the determinant of a submatrix of the input matrix. This allows, using the Hadamard inequality, to bound the size of these integers. Otherwise, the Bareiss algorithm may be viewed as a variant of Gaussian elimination and needs roughly the same number of arithmetic operations.

nibble
ground-up
cs
tcs
algorithms
complexity
linear-algebra
numerics
sci-comp
fields
calculation
nitty-gritty
june 2019 by nhaliday

What's the expected level of paper for top conferences in Computer Science - Academia Stack Exchange

june 2019 by nhaliday

Top. The top level.

My experience on program committees for STOC, FOCS, ITCS, SODA, SOCG, etc., is that there are FAR more submissions of publishable quality than can be accepted into the conference. By "publishable quality" I mean a well-written presentation of a novel, interesting, and non-trivial result within the scope of the conference.

...

There are several questions that come up over and over in the FOCS/STOC review cycle:

- How surprising / novel / elegant / interesting is the result?

- How surprising / novel / elegant / interesting / general are the techniques?

- How technically difficult is the result? Ironically, FOCS and STOC committees have a reputation for ignoring the distinction between trivial (easy to derive from scratch) and nondeterministically trivial (easy to understand after the fact).

- What is the expected impact of this result? Is this paper going to change the way people do theoretical computer science over the next five years?

- Is the result of general interest to the theoretical computer science community? Or is it only of interest to a narrow subcommunity? In particular, if the topic is outside the STOC/FOCS mainstream—say, for example, computational topology—does the paper do a good job of explaining and motivating the results to a typical STOC/FOCS audience?

nibble
q-n-a
overflow
academia
tcs
cs
meta:research
publishing
scholar
lens
properties
cost-benefit
analysis
impetus
increase-decrease
soft-question
motivation
proofs
search
complexity
analogy
problem-solving
elegance
synthesis
hi-order-bits
novelty
discovery
My experience on program committees for STOC, FOCS, ITCS, SODA, SOCG, etc., is that there are FAR more submissions of publishable quality than can be accepted into the conference. By "publishable quality" I mean a well-written presentation of a novel, interesting, and non-trivial result within the scope of the conference.

...

There are several questions that come up over and over in the FOCS/STOC review cycle:

- How surprising / novel / elegant / interesting is the result?

- How surprising / novel / elegant / interesting / general are the techniques?

- How technically difficult is the result? Ironically, FOCS and STOC committees have a reputation for ignoring the distinction between trivial (easy to derive from scratch) and nondeterministically trivial (easy to understand after the fact).

- What is the expected impact of this result? Is this paper going to change the way people do theoretical computer science over the next five years?

- Is the result of general interest to the theoretical computer science community? Or is it only of interest to a narrow subcommunity? In particular, if the topic is outside the STOC/FOCS mainstream—say, for example, computational topology—does the paper do a good job of explaining and motivating the results to a typical STOC/FOCS audience?

june 2019 by nhaliday

Browse the State-of-the-Art in Machine Learning

aggregator machine-learning deep-learning ai benchmarks state-of-art ranking top-n competition nibble dataset computer-vision comparison marginal nlp speedometer frontier open-problems list papers links info-foraging graphs time-series audio games medicine applications accuracy foreign-lang arrows questions

june 2019 by nhaliday

aggregator machine-learning deep-learning ai benchmarks state-of-art ranking top-n competition nibble dataset computer-vision comparison marginal nlp speedometer frontier open-problems list papers links info-foraging graphs time-series audio games medicine applications accuracy foreign-lang arrows questions

june 2019 by nhaliday

classification - ImageNet: what is top-1 and top-5 error rate? - Cross Validated

june 2019 by nhaliday

Now, in the case of top-1 score, you check if the top class (the one having the highest probability) is the same as the target label.

In the case of top-5 score, you check if the target label is one of your top 5 predictions (the 5 ones with the highest probabilities).

nibble
q-n-a
overflow
machine-learning
deep-learning
metrics
comparison
ranking
top-n
classification
computer-vision
benchmarks
dataset
accuracy
error
jargon
In the case of top-5 score, you check if the target label is one of your top 5 predictions (the 5 ones with the highest probabilities).

june 2019 by nhaliday

What every computer scientist should know about floating-point arithmetic

may 2019 by nhaliday

Floating-point arithmetic is considered as esoteric subject by many people. This is rather surprising, because floating-point is ubiquitous in computer systems: Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating point standard, and concludes with examples of how computer system builders can better support floating point.

Float Toy: http://evanw.github.io/float-toy/

https://news.ycombinator.com/item?id=22113485

https://stackoverflow.com/questions/2729637/does-epsilon-really-guarantees-anything-in-floating-point-computations

"you must use an epsilon when dealing with floats" is a knee-jerk reaction of programmers with a superficial understanding of floating-point computations, for comparisons in general (not only to zero).

This is usually unhelpful because it doesn't tell you how to minimize the propagation of rounding errors, it doesn't tell you how to avoid cancellation or absorption problems, and even when your problem is indeed related to the comparison of two floats, it doesn't tell you what value of epsilon is right for what you are doing.

...

Regarding the propagation of rounding errors, there exists specialized analyzers that can help you estimate it, because it is a tedious thing to do by hand.

https://www.di.ens.fr/~cousot/projects/DAEDALUS/synthetic_summary/CEA/Fluctuat/index.html

This was part of HW1 of CS24:

https://en.wikipedia.org/wiki/Kahan_summation_algorithm

In particular, simply summing n numbers in sequence has a worst-case error that grows proportional to n, and a root mean square error that grows as {\displaystyle {\sqrt {n}}} {\sqrt {n}} for random inputs (the roundoff errors form a random walk).[2] With compensated summation, the worst-case error bound is independent of n, so a large number of values can be summed with an error that only depends on the floating-point precision.[2]

cf:

https://en.wikipedia.org/wiki/Pairwise_summation

In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence.[1] Although there are other techniques such as Kahan summation that typically have even smaller round-off errors, pairwise summation is nearly as good (differing only by a logarithmic factor) while having much lower computational cost—it can be implemented so as to have nearly the same cost (and exactly the same number of arithmetic operations) as naive summation.

In particular, pairwise summation of a sequence of n numbers xn works by recursively breaking the sequence into two halves, summing each half, and adding the two sums: a divide and conquer algorithm. Its worst-case roundoff errors grow asymptotically as at most O(ε log n), where ε is the machine precision (assuming a fixed condition number, as discussed below).[1] In comparison, the naive technique of accumulating the sum in sequence (adding each xi one at a time for i = 1, ..., n) has roundoff errors that grow at worst as O(εn).[1] Kahan summation has a worst-case error of roughly O(ε), independent of n, but requires several times more arithmetic operations.[1] If the roundoff errors are random, and in particular have random signs, then they form a random walk and the error growth is reduced to an average of {\displaystyle O(\varepsilon {\sqrt {\log n}})} O(\varepsilon {\sqrt {\log n}}) for pairwise summation.[2]

A very similar recursive structure of summation is found in many fast Fourier transform (FFT) algorithms, and is responsible for the same slow roundoff accumulation of those FFTs.[2][3]

https://eng.libretexts.org/Bookshelves/Electrical_Engineering/Book%3A_Fast_Fourier_Transforms_(Burrus)/10%3A_Implementing_FFTs_in_Practice/10.8%3A_Numerical_Accuracy_in_FFTs

However, these encouraging error-growth rates only apply if the trigonometric “twiddle” factors in the FFT algorithm are computed very accurately. Many FFT implementations, including FFTW and common manufacturer-optimized libraries, therefore use precomputed tables of twiddle factors calculated by means of standard library functions (which compute trigonometric constants to roughly machine precision). The other common method to compute twiddle factors is to use a trigonometric recurrence formula—this saves memory (and cache), but almost all recurrences have errors that grow as O(n‾√) , O(n) or even O(n2) which lead to corresponding errors in the FFT.

...

There are, in fact, trigonometric recurrences with the same logarithmic error growth as the FFT, but these seem more difficult to implement efficiently; they require that a table of Θ(logn) values be stored and updated as the recurrence progresses. Instead, in order to gain at least some of the benefits of a trigonometric recurrence (reduced memory pressure at the expense of more arithmetic), FFTW includes several ways to compute a much smaller twiddle table, from which the desired entries can be computed accurately on the fly using a bounded number (usually <3) of complex multiplications. For example, instead of a twiddle table with n entries ωkn , FFTW can use two tables with Θ(n‾√) entries each, so that ωkn is computed by multiplying an entry in one table (indexed with the low-order bits of k ) by an entry in the other table (indexed with the high-order bits of k ).

[ed.: Nicholas Higham's "Accuracy and Stability of Numerical Algorithms" seems like a good reference for this kind of analysis.]

nibble
pdf
papers
programming
systems
numerics
nitty-gritty
intricacy
approximation
accuracy
types
sci-comp
multi
q-n-a
stackex
hmm
oly-programming
accretion
formal-methods
yak-shaving
wiki
reference
algorithms
yoga
ground-up
divide-and-conquer
fourier
books
tidbits
chart
caltech
nostalgia
dynamic
calculator
visualization
protocol-metadata
identity
Float Toy: http://evanw.github.io/float-toy/

https://news.ycombinator.com/item?id=22113485

https://stackoverflow.com/questions/2729637/does-epsilon-really-guarantees-anything-in-floating-point-computations

"you must use an epsilon when dealing with floats" is a knee-jerk reaction of programmers with a superficial understanding of floating-point computations, for comparisons in general (not only to zero).

This is usually unhelpful because it doesn't tell you how to minimize the propagation of rounding errors, it doesn't tell you how to avoid cancellation or absorption problems, and even when your problem is indeed related to the comparison of two floats, it doesn't tell you what value of epsilon is right for what you are doing.

...

Regarding the propagation of rounding errors, there exists specialized analyzers that can help you estimate it, because it is a tedious thing to do by hand.

https://www.di.ens.fr/~cousot/projects/DAEDALUS/synthetic_summary/CEA/Fluctuat/index.html

This was part of HW1 of CS24:

https://en.wikipedia.org/wiki/Kahan_summation_algorithm

In particular, simply summing n numbers in sequence has a worst-case error that grows proportional to n, and a root mean square error that grows as {\displaystyle {\sqrt {n}}} {\sqrt {n}} for random inputs (the roundoff errors form a random walk).[2] With compensated summation, the worst-case error bound is independent of n, so a large number of values can be summed with an error that only depends on the floating-point precision.[2]

cf:

https://en.wikipedia.org/wiki/Pairwise_summation

In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence.[1] Although there are other techniques such as Kahan summation that typically have even smaller round-off errors, pairwise summation is nearly as good (differing only by a logarithmic factor) while having much lower computational cost—it can be implemented so as to have nearly the same cost (and exactly the same number of arithmetic operations) as naive summation.

In particular, pairwise summation of a sequence of n numbers xn works by recursively breaking the sequence into two halves, summing each half, and adding the two sums: a divide and conquer algorithm. Its worst-case roundoff errors grow asymptotically as at most O(ε log n), where ε is the machine precision (assuming a fixed condition number, as discussed below).[1] In comparison, the naive technique of accumulating the sum in sequence (adding each xi one at a time for i = 1, ..., n) has roundoff errors that grow at worst as O(εn).[1] Kahan summation has a worst-case error of roughly O(ε), independent of n, but requires several times more arithmetic operations.[1] If the roundoff errors are random, and in particular have random signs, then they form a random walk and the error growth is reduced to an average of {\displaystyle O(\varepsilon {\sqrt {\log n}})} O(\varepsilon {\sqrt {\log n}}) for pairwise summation.[2]

A very similar recursive structure of summation is found in many fast Fourier transform (FFT) algorithms, and is responsible for the same slow roundoff accumulation of those FFTs.[2][3]

https://eng.libretexts.org/Bookshelves/Electrical_Engineering/Book%3A_Fast_Fourier_Transforms_(Burrus)/10%3A_Implementing_FFTs_in_Practice/10.8%3A_Numerical_Accuracy_in_FFTs

However, these encouraging error-growth rates only apply if the trigonometric “twiddle” factors in the FFT algorithm are computed very accurately. Many FFT implementations, including FFTW and common manufacturer-optimized libraries, therefore use precomputed tables of twiddle factors calculated by means of standard library functions (which compute trigonometric constants to roughly machine precision). The other common method to compute twiddle factors is to use a trigonometric recurrence formula—this saves memory (and cache), but almost all recurrences have errors that grow as O(n‾√) , O(n) or even O(n2) which lead to corresponding errors in the FFT.

...

There are, in fact, trigonometric recurrences with the same logarithmic error growth as the FFT, but these seem more difficult to implement efficiently; they require that a table of Θ(logn) values be stored and updated as the recurrence progresses. Instead, in order to gain at least some of the benefits of a trigonometric recurrence (reduced memory pressure at the expense of more arithmetic), FFTW includes several ways to compute a much smaller twiddle table, from which the desired entries can be computed accurately on the fly using a bounded number (usually <3) of complex multiplications. For example, instead of a twiddle table with n entries ωkn , FFTW can use two tables with Θ(n‾√) entries each, so that ωkn is computed by multiplying an entry in one table (indexed with the low-order bits of k ) by an entry in the other table (indexed with the high-order bits of k ).

[ed.: Nicholas Higham's "Accuracy and Stability of Numerical Algorithms" seems like a good reference for this kind of analysis.]

may 2019 by nhaliday

[1803.00085] Chinese Text in the Wild

may 2019 by nhaliday

We introduce Chinese Text in the Wild, a very large dataset of Chinese text in street view images.

...

We give baseline results using several state-of-the-art networks, including AlexNet, OverFeat, Google Inception and ResNet for character recognition, and YOLOv2 for character detection in images. Overall Google Inception has the best performance on recognition with 80.5% top-1 accuracy, while YOLOv2 achieves an mAP of 71.0% on detection. Dataset, source code and trained models will all be publicly available on the website.

nibble
pdf
papers
preprint
machine-learning
deep-learning
deepgoog
state-of-art
china
asia
writing
language
dataset
error
accuracy
computer-vision
pic
ocr
org:mat
benchmarks
questions
...

We give baseline results using several state-of-the-art networks, including AlexNet, OverFeat, Google Inception and ResNet for character recognition, and YOLOv2 for character detection in images. Overall Google Inception has the best performance on recognition with 80.5% top-1 accuracy, while YOLOv2 achieves an mAP of 71.0% on detection. Dataset, source code and trained models will all be publicly available on the website.

may 2019 by nhaliday

NSF/IEEE-TCPP Curriculum Initiative on Parallel and Distributed Computing -- Core Topics for Undergraduates | NSF/IEEE-TCPP Curriculum Initiative

nibble roadmap list links teaching programming algorithms distributed concurrency systems hardware accretion quixotic org:edu education higher-ed advanced

may 2019 by nhaliday

nibble roadmap list links teaching programming algorithms distributed concurrency systems hardware accretion quixotic org:edu education higher-ed advanced

may 2019 by nhaliday

algorithm - Skip List vs. Binary Search Tree - Stack Overflow

may 2019 by nhaliday

Skip lists are more amenable to concurrent access/modification. Herb Sutter wrote an article about data structure in concurrent environments. It has more indepth information.

The most frequently used implementation of a binary search tree is a red-black tree. The concurrent problems come in when the tree is modified it often needs to rebalance. The rebalance operation can affect large portions of the tree, which would require a mutex lock on many of the tree nodes. Inserting a node into a skip list is far more localized, only nodes directly linked to the affected node need to be locked.

q-n-a
stackex
nibble
programming
tcs
data-structures
performance
concurrency
comparison
cost-benefit
applicability-prereqs
random
trees
tradeoffs
The most frequently used implementation of a binary search tree is a red-black tree. The concurrent problems come in when the tree is modified it often needs to rebalance. The rebalance operation can affect large portions of the tree, which would require a mutex lock on many of the tree nodes. Inserting a node into a skip list is far more localized, only nodes directly linked to the affected node need to be locked.

may 2019 by nhaliday

Burrito: Rethinking the Electronic Lab Notebook

may 2019 by nhaliday

Seems very well-suited for ML experiments (if you can get it to work), also the nilfs aspect is cool and basically implements exactly one of the my project ideas (mini-VCS for competitive programming). Unfortunately gnarly installation instructions specify running it on Linux VM: https://github.com/pgbovine/burrito/blob/master/INSTALL. Linux is hard requirement due to nilfs.

techtariat
project
tools
devtools
linux
programming
yak-shaving
integration-extension
nitty-gritty
workflow
exocortex
scholar
software
python
app
desktop
notetaking
state
machine-learning
data-science
nibble
sci-comp
oly
vcs
multi
repo
paste
homepage
research
may 2019 by nhaliday

Philip Guo - Research Design Patterns

may 2019 by nhaliday

List of ways to generate research directions. Some are pretty specific to applied CS.

techtariat
nibble
academia
meta:research
scholar
cs
systems
list
top-n
checklists
ideas
creative
frontier
memes(ew)
info-dynamics
innovation
novelty
the-trenches
tactics
may 2019 by nhaliday

What happens when you load a URL?

dan-luu techtariat links list minimum-viable systems interview-prep explanation google networking distributed programming recruiting career init repo synthesis system-design 🖥 paste big-picture working-stiff scaling-tech nibble metal-to-virtual hardware IEEE web internet questions objektbuch client-server nitty-gritty detail-architecture

may 2019 by nhaliday

dan-luu techtariat links list minimum-viable systems interview-prep explanation google networking distributed programming recruiting career init repo synthesis system-design 🖥 paste big-picture working-stiff scaling-tech nibble metal-to-virtual hardware IEEE web internet questions objektbuch client-server nitty-gritty detail-architecture

may 2019 by nhaliday

ON THE GEOMETRY OF NASH EQUILIBRIA AND CORRELATED EQUILIBRIA

may 2019 by nhaliday

Abstract: It is well known that the set of correlated equilibrium distributions of an n-player noncooperative game is a convex polytope that includes all the Nash equilibrium distributions. We demonstrate an elementary yet surprising result: the Nash equilibria all lie on the boundary of the polytope.

pdf
nibble
papers
ORFE
game-theory
optimization
geometry
dimensionality
linear-algebra
equilibrium
structure
differential
correlation
iidness
acm
linear-programming
spatial
characterization
levers
may 2019 by nhaliday

linear algebra - Recommendations for a usable, fast C++ matrix library? - Computational Science Stack Exchange

may 2019 by nhaliday

Basically the same question popped up on SO:

nibble
q-n-a
overflow
programming
libraries
software
recommendations
data-science
c(pp)
systems
performance
linear-algebra
science
best-practices
numerics
types
comparison
links
stackex
sci-comp
ecosystem
list
may 2019 by nhaliday

A Recipe for Training Neural Networks

april 2019 by nhaliday

https://twitter.com/chipro/status/1188000997890646017

I saw a guy debugging his model today.

No idpb.

No unittest.

No visualization.

No disabling regularization.

He just sat there staring at every line of code and cursing TensorFlow.

Like every machine learning researcher I know.

https://twitter.com/chipro/status/1189564204312711170

1. http://josh-tobin.com/assets/pdf/troubleshooting-deep-neural-networks-01-19.pdf …

2. https://medium.com/infinity-aka-aseem/things-we-wish-we-had-known-before-we-started-our-first-machine-learning-project-336d1d6f2184 …

3. https://medium.com/@keeper6928/how-to-unit-test-machine-learning-code-57cf6fd81765 …

4. https://pcc.cs.byu.edu/2017/10/02/practical-advice-for-building-deep-neural-networks/ …

5. https://medium.com/ai%C2%B3-theory-practice-business/top-6-errors-novice-machine-learning-engineers-make-e82273d394db …

6. http://karpathy.github.io/2019/04/25/recipe/ …

more specific to debugging but more general in terms of model class: https://pinboard.in/u:nhaliday/b:869479c9d2d3

acmtariat
org:bleg
nibble
machine-learning
deep-learning
howto
tutorial
guide
nitty-gritty
gotchas
init
list
checklists
expert-experience
abstraction
composition-decomposition
gradient-descent
data-science
error
debugging
benchmarks
programming
engineering
best-practices
dataviz
checking
plots
generalization
regularization
unsupervised
optimization
ensembles
random
methodology
multi
twitter
social
discussion
techtariat
links
org:med
pdf
visualization
python
recommendations
advice
devtools
I saw a guy debugging his model today.

No idpb.

No unittest.

No visualization.

No disabling regularization.

He just sat there staring at every line of code and cursing TensorFlow.

Like every machine learning researcher I know.

https://twitter.com/chipro/status/1189564204312711170

1. http://josh-tobin.com/assets/pdf/troubleshooting-deep-neural-networks-01-19.pdf …

2. https://medium.com/infinity-aka-aseem/things-we-wish-we-had-known-before-we-started-our-first-machine-learning-project-336d1d6f2184 …

3. https://medium.com/@keeper6928/how-to-unit-test-machine-learning-code-57cf6fd81765 …

4. https://pcc.cs.byu.edu/2017/10/02/practical-advice-for-building-deep-neural-networks/ …

5. https://medium.com/ai%C2%B3-theory-practice-business/top-6-errors-novice-machine-learning-engineers-make-e82273d394db …

6. http://karpathy.github.io/2019/04/25/recipe/ …

more specific to debugging but more general in terms of model class: https://pinboard.in/u:nhaliday/b:869479c9d2d3

april 2019 by nhaliday

Biological rules - Wikipedia

nibble wiki reference list bio ecology analytical-holistic levers systematic-ad-hoc order-disorder logos evolution deep-materialism selection kinship fluid environment identity correlation power-law oceans geography temperature darwinian sex path-dependence competition big-peeps the-classics magnitude

march 2019 by nhaliday

nibble wiki reference list bio ecology analytical-holistic levers systematic-ad-hoc order-disorder logos evolution deep-materialism selection kinship fluid environment identity correlation power-law oceans geography temperature darwinian sex path-dependence competition big-peeps the-classics magnitude

march 2019 by nhaliday

soft question - What are good non-English languages for mathematicians to know? - MathOverflow

february 2019 by nhaliday

I'm with Deane here: I think learning foreign languages is not a very mathematically productive thing to do; of course, there are lots of good reasons to learn foreign languages, but doing mathematics is not one of them. Not only are there few modern mathematics papers written in languages other than English, but the primary other language they are written (French) in is pretty easy to read without actually knowing it.

Even though I've been to France several times, my spoken French mostly consists of "merci," "si vous plait," "d'accord" and some food words; I've still skimmed 100 page long papers in French without a lot of trouble.

If nothing else, think of reading a paper in French as a good opportunity to teach Google Translate some mathematical French.

q-n-a
overflow
math
academia
learning
foreign-lang
publishing
science
french
soft-question
math.AG
nibble
quixotic
comparison
language
china
asia
trends
Even though I've been to France several times, my spoken French mostly consists of "merci," "si vous plait," "d'accord" and some food words; I've still skimmed 100 page long papers in French without a lot of trouble.

If nothing else, think of reading a paper in French as a good opportunity to teach Google Translate some mathematical French.

february 2019 by nhaliday

8 PCA – A Powerful Method for Analyze Ecological Niches

august 2018 by nhaliday

Influences of ecology and biogeography on shaping the distributions of cryptic species: three bat tales in Iberia: https://academic.oup.com/biolinnean/article/112/1/150/2415750

Combining Historical Biogeography with Niche Modeling in theCaprifoliumClade ofLonicera(Caprifoliaceae, Dipsacales): https://watermark.silverchair.com/syq011.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAagwggGkBgkqhkiG9w0BBwagggGVMIIBkQIBADCCAYoGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMnQcew1QnnjkjJSlVAgEQgIIBW-Nu-4L3xpOdRIb27NdbMbhPjaeByMM3g6H1bpeMMK4OJ9gBOH7V5WfuKGlHlsgsStQQLC_s2YGVu5KDOtwhudWOPFqrXmYlAXjhFNi5hFNpCxjNT-4tTJlRJHU5plgPE2BWZht5okuM2sngjX3t5dDScmz0oTBvu7xnUXo3sbGkad6gw-za6Rpyl5_3-nnnbOpz6WeqfxcR7NDGwPd741QVJKjjp-FHPf8JdWN3mcsLMVJ6p11FoeMeQdA7gsyXhKDPfE8sJ2Xamjxk5uSaGkfi1bi71OB1Ag0UvV2xlON1UwWD9V8tE7e3JJQanv_aKgKyppuXQikoMhH05x_nCFsiVif-_-26Yyx0CMIHv4so81sOpwN5YM_BISyUp_RoT2yfjiEhZpcJlyWX4z6ZeKAUEICloT8evsOX8Ll4FUocBHARhnqZgRlc8w33b_J3wslXv-PVBvvXNs0h

pdf
article
study
methodology
bio
ecology
data
analysis
stats
exploratory
matrix-factorization
geography
environment
time
crosstab
history
letters
correlation
evolution
distribution
examples
high-dimension
multi
chart
howto
objektbuch
metabuch
nibble
data-science
things
Combining Historical Biogeography with Niche Modeling in theCaprifoliumClade ofLonicera(Caprifoliaceae, Dipsacales): https://watermark.silverchair.com/syq011.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAagwggGkBgkqhkiG9w0BBwagggGVMIIBkQIBADCCAYoGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMnQcew1QnnjkjJSlVAgEQgIIBW-Nu-4L3xpOdRIb27NdbMbhPjaeByMM3g6H1bpeMMK4OJ9gBOH7V5WfuKGlHlsgsStQQLC_s2YGVu5KDOtwhudWOPFqrXmYlAXjhFNi5hFNpCxjNT-4tTJlRJHU5plgPE2BWZht5okuM2sngjX3t5dDScmz0oTBvu7xnUXo3sbGkad6gw-za6Rpyl5_3-nnnbOpz6WeqfxcR7NDGwPd741QVJKjjp-FHPf8JdWN3mcsLMVJ6p11FoeMeQdA7gsyXhKDPfE8sJ2Xamjxk5uSaGkfi1bi71OB1Ag0UvV2xlON1UwWD9V8tE7e3JJQanv_aKgKyppuXQikoMhH05x_nCFsiVif-_-26Yyx0CMIHv4so81sOpwN5YM_BISyUp_RoT2yfjiEhZpcJlyWX4z6ZeKAUEICloT8evsOX8Ll4FUocBHARhnqZgRlc8w33b_J3wslXv-PVBvvXNs0h

august 2018 by nhaliday

Computer Scientists Close In on Unique Games Conjecture Proof | Quanta Magazine

news org:mag org:sci popsci tcs cs computation UGC complexity approximation lower-bounds hardness nibble org:inst rand-approx proofs big-surf announcement SDP optimization open-problems questions research

june 2018 by nhaliday

news org:mag org:sci popsci tcs cs computation UGC complexity approximation lower-bounds hardness nibble org:inst rand-approx proofs big-surf announcement SDP optimization open-problems questions research

june 2018 by nhaliday

The Physics of Information Processing Superobjects: Daily Life Among the Jupiter Brains

nibble pdf study article essay ratty bostrom physics lower-bounds interdisciplinary computation frontier singularity civilization communication time phys-energy thermo entropy-like lens intelligence futurism philosophy software hardware enhancement no-go data scale magnitude network-structure structure complex-systems concurrency density bits retention mechanics electromag quantum quantum-info speed information-theory measure chemistry gravity relativity the-world-is-just-atoms dirty-hands skunkworks gedanken ideas hard-tech nitty-gritty intricacy len:long spatial whole-partial-many frequency neuro internet web trivia cocktail humanity composition-decomposition instinct reason illusion the-self psychology cog-psych dennett within-without signal-noise coding-theory quotes scifi-fantasy fiction giants death long-short-run janus eden-heaven efficiency finiteness iteration-recursion cycles nietzschean big-peeps examples

april 2018 by nhaliday

nibble pdf study article essay ratty bostrom physics lower-bounds interdisciplinary computation frontier singularity civilization communication time phys-energy thermo entropy-like lens intelligence futurism philosophy software hardware enhancement no-go data scale magnitude network-structure structure complex-systems concurrency density bits retention mechanics electromag quantum quantum-info speed information-theory measure chemistry gravity relativity the-world-is-just-atoms dirty-hands skunkworks gedanken ideas hard-tech nitty-gritty intricacy len:long spatial whole-partial-many frequency neuro internet web trivia cocktail humanity composition-decomposition instinct reason illusion the-self psychology cog-psych dennett within-without signal-noise coding-theory quotes scifi-fantasy fiction giants death long-short-run janus eden-heaven efficiency finiteness iteration-recursion cycles nietzschean big-peeps examples

april 2018 by nhaliday

Iterated Distillation and Amplification – AI Alignment

nibble org:bleg acmtariat clever-rats ai ai-control alignment iteration-recursion deepgoog games research research-program algorithms explanation exposition summary reinforcement values machine-learning human-ml tradeoffs volo-avolo ratty

april 2018 by nhaliday

nibble org:bleg acmtariat clever-rats ai ai-control alignment iteration-recursion deepgoog games research research-program algorithms explanation exposition summary reinforcement values machine-learning human-ml tradeoffs volo-avolo ratty

april 2018 by nhaliday

[1804.04268] Incomplete Contracting and AI Alignment

april 2018 by nhaliday

We suggest that the analysis of incomplete contracting developed by law and economics researchers can provide a useful framework for understanding the AI alignment problem and help to generate a systematic approach to finding solutions. We first provide an overview of the incomplete contracting literature and explore parallels between this work and the problem of AI alignment. As we emphasize, misalignment between principal and agent is a core focus of economic analysis. We highlight some technical results from the economics literature on incomplete contracts that may provide insights for AI alignment researchers. Our core contribution, however, is to bring to bear an insight that economists have been urged to absorb from legal scholars and other behavioral scientists: the fact that human contracting is supported by substantial amounts of external structure, such as generally available institutions (culture, law) that can supply implied terms to fill the gaps in incomplete contracts. We propose a research agenda for AI alignment work that focuses on the problem of how to build AI that can replicate the human cognitive processes that connect individual incomplete contracts with this supporting external structure.

nibble
preprint
org:mat
papers
ai
ai-control
alignment
coordination
contracts
law
economics
interests
culture
institutions
number
context
behavioral-econ
composition-decomposition
rent-seeking
whole-partial-many
april 2018 by nhaliday

Theory of Self-Reproducing Automata - John von Neumann

april 2018 by nhaliday

Fourth Lecture: THE ROLE OF HIGH AND OF EXTREMELY HIGH COMPLICATION

Comparisons between computing machines and the nervous systems. Estimates of size for computing machines, present and near future.

Estimates for size for the human central nervous system. Excursus about the “mixed” character of living organisms. Analog and digital elements. Observations about the “mixed” character of all componentry, artificial as well as natural. Interpretation of the position to be taken with respect to these.

Evaluation of the discrepancy in size between artificial and natural automata. Interpretation of this discrepancy in terms of physical factors. Nature of the materials used.

The probability of the presence of other intellectual factors. The role of complication and the theoretical penetration that it requires.

Questions of reliability and errors reconsidered. Probability of individual errors and length of procedure. Typical lengths of procedure for computing machines and for living organisms--that is, for artificial and for natural automata. Upper limits on acceptable probability of error in individual operations. Compensation by checking and self-correcting features.

Differences of principle in the way in which errors are dealt with in artificial and in natural automata. The “single error” principle in artificial automata. Crudeness of our approach in this case, due to the lack of adequate theory. More sophisticated treatment of this problem in natural automata: The role of the autonomy of parts. Connections between this autonomy and evolution.

- 10^10 neurons in brain, 10^4 vacuum tubes in largest computer at time

- machines faster: 5 ms from neuron potential to neuron potential, 10^-3 ms for vacuum tubes

https://en.wikipedia.org/wiki/John_von_Neumann#Computing

pdf
article
papers
essay
nibble
math
cs
computation
bio
neuro
neuro-nitgrit
scale
magnitude
comparison
acm
von-neumann
giants
thermo
phys-energy
speed
performance
time
density
frequency
hardware
ems
efficiency
dirty-hands
street-fighting
fermi
estimate
retention
physics
interdisciplinary
multi
wiki
links
people
🔬
atoms
duplication
iteration-recursion
turing
complexity
measure
nature
technology
complex-systems
bits
information-theory
circuits
robust
structure
composition-decomposition
evolution
mutation
axioms
analogy
thinking
input-output
hi-order-bits
coding-theory
flexibility
rigidity
automata-languages
Comparisons between computing machines and the nervous systems. Estimates of size for computing machines, present and near future.

Estimates for size for the human central nervous system. Excursus about the “mixed” character of living organisms. Analog and digital elements. Observations about the “mixed” character of all componentry, artificial as well as natural. Interpretation of the position to be taken with respect to these.

Evaluation of the discrepancy in size between artificial and natural automata. Interpretation of this discrepancy in terms of physical factors. Nature of the materials used.

The probability of the presence of other intellectual factors. The role of complication and the theoretical penetration that it requires.

Questions of reliability and errors reconsidered. Probability of individual errors and length of procedure. Typical lengths of procedure for computing machines and for living organisms--that is, for artificial and for natural automata. Upper limits on acceptable probability of error in individual operations. Compensation by checking and self-correcting features.

Differences of principle in the way in which errors are dealt with in artificial and in natural automata. The “single error” principle in artificial automata. Crudeness of our approach in this case, due to the lack of adequate theory. More sophisticated treatment of this problem in natural automata: The role of the autonomy of parts. Connections between this autonomy and evolution.

- 10^10 neurons in brain, 10^4 vacuum tubes in largest computer at time

- machines faster: 5 ms from neuron potential to neuron potential, 10^-3 ms for vacuum tubes

https://en.wikipedia.org/wiki/John_von_Neumann#Computing

april 2018 by nhaliday

Differentiable Plasticity: A New Method for Learning to Learn | Uber Engineering Blog

techtariat project machine-learning deep-learning papers liner-notes gradient-descent neuro analogy flexibility wire-guided reinforcement nibble research auto-learning concept atoms org:com gig-econ

april 2018 by nhaliday

techtariat project machine-learning deep-learning papers liner-notes gradient-descent neuro analogy flexibility wire-guided reinforcement nibble research auto-learning concept atoms org:com gig-econ

april 2018 by nhaliday

Argument, intuition, and recursion

ratty lesswrong clever-rats acmtariat nibble reflection thinking metameta metabuch skeleton reason math thick-thin empirical science rationality epistemic intuition logic economics models theory-practice applicability-prereqs heuristic problem-solving analytical-holistic futurism lens speedometer frontier caching universalism-particularism duality fourier examples ai risk speed robust reinforcement machine-learning social-science tricki meta:rhetoric debate crux composition-decomposition structure convergence zooming neurons checklists advice strategy meta:prediction tetlock

april 2018 by nhaliday

ratty lesswrong clever-rats acmtariat nibble reflection thinking metameta metabuch skeleton reason math thick-thin empirical science rationality epistemic intuition logic economics models theory-practice applicability-prereqs heuristic problem-solving analytical-holistic futurism lens speedometer frontier caching universalism-particularism duality fourier examples ai risk speed robust reinforcement machine-learning social-science tricki meta:rhetoric debate crux composition-decomposition structure convergence zooming neurons checklists advice strategy meta:prediction tetlock

april 2018 by nhaliday

An Outsider's Tour of Reinforcement Learning – arg min blog

acmtariat ben-recht org:bleg nibble exposition explanation expert-experience tutorial guide yoga reinforcement optimization linear-algebra model-class atoms concept signal-noise iteration-recursion volo-avolo benchmarks deep-learning unsupervised thinking descriptive values gradient-descent acm decision-theory decision-making math.DS sequential random search realness hi-order-bits synthesis coarse-fine bare-hands openai replication linearity nonlinearity research

april 2018 by nhaliday

acmtariat ben-recht org:bleg nibble exposition explanation expert-experience tutorial guide yoga reinforcement optimization linear-algebra model-class atoms concept signal-noise iteration-recursion volo-avolo benchmarks deep-learning unsupervised thinking descriptive values gradient-descent acm decision-theory decision-making math.DS sequential random search realness hi-order-bits synthesis coarse-fine bare-hands openai replication linearity nonlinearity research

april 2018 by nhaliday

[0712.3329] Universal Intelligence: A Definition of Machine Intelligence

april 2018 by nhaliday

- Shane Legg, Marcus Hutter

nibble
papers
org:mat
preprint
machine-learning
ai
intelligence
intricacy
definition
philosophy
psychology
cog-psych
psychometrics
decision-theory
order-disorder
deepgoog
flux-stasis
learning
list
links
quotes
rigor
lens
skeleton
metameta
search
problem-solving
generalization
complex-systems
cybernetics
volo-avolo
impetus
optimization
abstraction
big-peeps
academia
bio
measurement
universalism-particularism
big-picture
ideas
big-surf
synthesis
structure
large-factor
dimensionality
things
properties
detail-architecture
telos-atelos
values
descriptive
flexibility
occam
parsimony
cs
computation
bits
information-theory
complexity
absolute-relative
humanity
pragmatic
biases
turing
thick-thin
dennett
within-without
creative
theory-of-mind
nitty-gritty
spock
survey
reinforcement
april 2018 by nhaliday

[0706.3639] A Collection of Definitions of Intelligence

april 2018 by nhaliday

- Shane Legg, Marcus Hutter

nibble
papers
org:mat
preprint
machine-learning
ai
intelligence
intricacy
definition
philosophy
psychology
cog-psych
psychometrics
decision-theory
order-disorder
deepgoog
flux-stasis
learning
list
links
quotes
rigor
lens
skeleton
metameta
search
problem-solving
generalization
complex-systems
cybernetics
volo-avolo
impetus
optimization
abstraction
big-peeps
academia
bio
measurement
universalism-particularism
big-picture
ideas
big-surf
synthesis
things
properties
detail-architecture
telos-atelos
values
descriptive
flexibility
humanity
nitty-gritty
spock
survey
the-self
april 2018 by nhaliday

Copy this bookmark: