**nhaliday : algorithms**
188

Is Parallel Programming Hard, And, If So, What Can You Do About It?

unit draft books encyclopedic yoga accretion quixotic concurrency cs programming systems cost-benefit intricacy hardware distributed synchrony data-structures algorithms hashing correctness checking debugging performance estimate street-fighting benchmarks measurement traces formal-methods state

11 weeks ago by nhaliday

unit draft books encyclopedic yoga accretion quixotic concurrency cs programming systems cost-benefit intricacy hardware distributed synchrony data-structures algorithms hashing correctness checking debugging performance estimate street-fighting benchmarks measurement traces formal-methods state

11 weeks ago by nhaliday

CppCon 2014: Chandler Carruth "Efficiency with Algorithms, Performance with Data Structures" - YouTube

october 2019 by nhaliday

- idk how I feel about this

- makes a distinction between efficiency (basically asymptotic complexity, "doing less work") and performance ("doing that work faster"). idiosyncratic terminology but similar to the "two performance aesthetics" described here: https://pinboard.in/u:nhaliday/b:913a284640c5

- some bikeshedding about vector::reserve and references

- "discontiguous data structures are the root of all evil" (cache-locality, don't use linked lists, etc)

- stacks? queues? just use vector. also suggests circular buffers. says std::deque is really bad

- std::map is bad too (for real SWE, not oly-programming). if you want ordered associative container, just binary search in vector

- std::unordered_map is poorly implemented, unfortunately (due to requirement for buckets in API)

- good implementation of hash table uses open addressing and local (linear?) probing

video
presentation
performance
nitty-gritty
best-practices
working-stiff
programming
c(pp)
systems
data-structures
algorithms
jvm
pls
metal-to-virtual
stylized-facts
rhetoric
expert-experience
google
llvm
efficiency
time-complexity
mobile
computer-memory
caching
oly-programming
common-case
hashing
multi
energy-resources
methodology
trees
techtariat
local-global
- makes a distinction between efficiency (basically asymptotic complexity, "doing less work") and performance ("doing that work faster"). idiosyncratic terminology but similar to the "two performance aesthetics" described here: https://pinboard.in/u:nhaliday/b:913a284640c5

- some bikeshedding about vector::reserve and references

- "discontiguous data structures are the root of all evil" (cache-locality, don't use linked lists, etc)

- stacks? queues? just use vector. also suggests circular buffers. says std::deque is really bad

- std::map is bad too (for real SWE, not oly-programming). if you want ordered associative container, just binary search in vector

- std::unordered_map is poorly implemented, unfortunately (due to requirement for buckets in API)

- good implementation of hash table uses open addressing and local (linear?) probing

october 2019 by nhaliday

Unhappy Go Lucky!

september 2019 by nhaliday

- regularly publishes unofficial editorials for AtCoder

- also seems like an otaku >_>

oly
oly-programming
blog
stream
algorithms
problem-solving
accretion
puzzles
techtariat
japan
asia
quixotic
yoga
- also seems like an otaku >_>

september 2019 by nhaliday

Parallel Computing: Theory and Practice

september 2019 by nhaliday

by Umut Acar who also co-authored a different book on parallel algorithms w/ Guy Blelloch from a more high-level and functional perspective

unit
books
cmu
cs
programming
tcs
algorithms
concurrency
c(pp)
divide-and-conquer
libraries
complexity
time-complexity
data-structures
orders
graphs
graph-theory
trees
models
functional
metal-to-virtual
systems
september 2019 by nhaliday

How to fool the "try some test cases" heuristic: Algorithms that appear correct, but are actually incorrect - Computer Science Stack Exchange

nibble q-n-a overflow cs programming correctness error measurement random checking examples counterexample rigor rand-approx greedy debugging math.NT multiplicative algorithms intricacy heuristic methodology

august 2019 by nhaliday

nibble q-n-a overflow cs programming correctness error measurement random checking examples counterexample rigor rand-approx greedy debugging math.NT multiplicative algorithms intricacy heuristic methodology

august 2019 by nhaliday

Operations on polynomials (on cp-algorithms) - Codeforces

august 2019 by nhaliday

https://stackoverflow.com/questions/44770632/fft-division-for-fast-polynomial-division

links to good lecture notes: http://web.cs.iastate.edu/~cs577/handouts/polydivide.pdf

oly
oly-programming
programming
python
examples
nitty-gritty
polynomials
algebra
math.CA
yoga
multi
pdf
lecture-notes
howto
algorithms
q-n-a
stackex
fourier
libraries
multiplicative
math.NT
calculation
links to good lecture notes: http://web.cs.iastate.edu/~cs577/handouts/polydivide.pdf

august 2019 by nhaliday

Anti-hash test. - Codeforces

august 2019 by nhaliday

- Thue-Morse sequence

- nice paper: http://www.mii.lt/olympiads_in_informatics/pdf/INFOL119.pdf

In general, polynomial string hashing is a useful technique in construction of efficient string algorithms. One simply needs to remember to carefully select the modulus M and the variable of the polynomial p depending on the application. A good rule of thumb is to pick both values as prime numbers with M as large as possible so that no integer overflow occurs and p being at least the size of the alphabet.

2.2. Upper Bound on M

[stuff about 32- and 64-bit integers]

2.3. Lower Bound on M

On the other side Mis bounded due to the well-known birthday paradox: if we consider a collection of m keys with m ≥ 1.2√M then the chance of a collision to occur within this collection is at least 50% (assuming that the distribution of fingerprints is close to uniform on the set of all strings). Thus if the birthday paradox applies then one needs to choose M=ω(m^2)to have a fair chance to avoid a collision. However, one should note that not always the birthday paradox applies. As a benchmark consider the following two problems.

I generally prefer to use Schwartz-Zippel to reason about collision probabilities w/ this kind of thing, eg, https://people.eecs.berkeley.edu/~sinclair/cs271/n3.pdf.

A good way to get more accurate results: just use multiple primes and the Chinese remainder theorem to get as large an M as you need w/o going beyond 64-bit arithmetic.

more on this: https://codeforces.com/blog/entry/60442

oly
oly-programming
gotchas
howto
hashing
algorithms
strings
random
best-practices
counterexample
multi
pdf
papers
nibble
examples
fields
polynomials
lecture-notes
yoga
probability
estimate
magnitude
hacker
adversarial
CAS
lattice
discrete
- nice paper: http://www.mii.lt/olympiads_in_informatics/pdf/INFOL119.pdf

In general, polynomial string hashing is a useful technique in construction of efficient string algorithms. One simply needs to remember to carefully select the modulus M and the variable of the polynomial p depending on the application. A good rule of thumb is to pick both values as prime numbers with M as large as possible so that no integer overflow occurs and p being at least the size of the alphabet.

2.2. Upper Bound on M

[stuff about 32- and 64-bit integers]

2.3. Lower Bound on M

On the other side Mis bounded due to the well-known birthday paradox: if we consider a collection of m keys with m ≥ 1.2√M then the chance of a collision to occur within this collection is at least 50% (assuming that the distribution of fingerprints is close to uniform on the set of all strings). Thus if the birthday paradox applies then one needs to choose M=ω(m^2)to have a fair chance to avoid a collision. However, one should note that not always the birthday paradox applies. As a benchmark consider the following two problems.

I generally prefer to use Schwartz-Zippel to reason about collision probabilities w/ this kind of thing, eg, https://people.eecs.berkeley.edu/~sinclair/cs271/n3.pdf.

A good way to get more accurate results: just use multiple primes and the Chinese remainder theorem to get as large an M as you need w/o going beyond 64-bit arithmetic.

more on this: https://codeforces.com/blog/entry/60442

august 2019 by nhaliday

Epigrams in Programming | Computer Science

august 2019 by nhaliday

- Alan Perlis

nibble
quotes
aphorism
list
cs
computation
programming
pls
hi-order-bits
synthesis
lens
data-structures
arrows
algorithms
iteration-recursion
intricacy
strings
types
math
formal-methods
pic
visuo
visual-understanding
systems
state
structure
turing
cost-benefit
lisp
performance
software
language
plt
invariance
ends-means
ai
nitty-gritty
sci-comp
composition-decomposition
tradeoffs
grokkability
assembly
internet
egalitarianism-hierarchy
functional
impetus
roots
path-dependence
heavyweights
grokkability-clarity
august 2019 by nhaliday

Factorization of polynomials over finite fields - Wikipedia

july 2019 by nhaliday

In mathematics and computer algebra the factorization of a polynomial consists of decomposing it into a product of irreducible factors. This decomposition is theoretically possible and is unique for polynomials with coefficients in any field, but rather strong restrictions on the field of the coefficients are needed to allow the computation of the factorization by means of an algorithm. In practice, algorithms have been designed only for polynomials with coefficients in a finite field, in the field of rationals or in a finitely generated field extension of one of them.

All factorization algorithms, including the case of multivariate polynomials over the rational numbers, reduce the problem to this case; see polynomial factorization. It is also used for various applications of finite fields, such as coding theory (cyclic redundancy codes and BCH codes), cryptography (public key cryptography by the means of elliptic curves), and computational number theory.

As the reduction of the factorization of multivariate polynomials to that of univariate polynomials does not have any specificity in the case of coefficients in a finite field, only polynomials with one variable are considered in this article.

...

In the algorithms that follow, the complexities are expressed in terms of number of arithmetic operations in Fq, using classical algorithms for the arithmetic of polynomials.

[ed.: Interesting choice...]

...

Factoring algorithms

Many algorithms for factoring polynomials over finite fields include the following three stages:

Square-free factorization

Distinct-degree factorization

Equal-degree factorization

An important exception is Berlekamp's algorithm, which combines stages 2 and 3.

Berlekamp's algorithm

Main article: Berlekamp's algorithm

The Berlekamp's algorithm is historically important as being the first factorization algorithm, which works well in practice. However, it contains a loop on the elements of the ground field, which implies that it is practicable only over small finite fields. For a fixed ground field, its time complexity is polynomial, but, for general ground fields, the complexity is exponential in the size of the ground field.

[ed.: This actually looks fairly implementable.]

wiki
reference
concept
algorithms
calculation
nibble
numerics
math
algebra
math.CA
fields
polynomials
levers
multiplicative
math.NT
All factorization algorithms, including the case of multivariate polynomials over the rational numbers, reduce the problem to this case; see polynomial factorization. It is also used for various applications of finite fields, such as coding theory (cyclic redundancy codes and BCH codes), cryptography (public key cryptography by the means of elliptic curves), and computational number theory.

As the reduction of the factorization of multivariate polynomials to that of univariate polynomials does not have any specificity in the case of coefficients in a finite field, only polynomials with one variable are considered in this article.

...

In the algorithms that follow, the complexities are expressed in terms of number of arithmetic operations in Fq, using classical algorithms for the arithmetic of polynomials.

[ed.: Interesting choice...]

...

Factoring algorithms

Many algorithms for factoring polynomials over finite fields include the following three stages:

Square-free factorization

Distinct-degree factorization

Equal-degree factorization

An important exception is Berlekamp's algorithm, which combines stages 2 and 3.

Berlekamp's algorithm

Main article: Berlekamp's algorithm

The Berlekamp's algorithm is historically important as being the first factorization algorithm, which works well in practice. However, it contains a loop on the elements of the ground field, which implies that it is practicable only over small finite fields. For a fixed ground field, its time complexity is polynomial, but, for general ground fields, the complexity is exponential in the size of the ground field.

[ed.: This actually looks fairly implementable.]

july 2019 by nhaliday

data structures - Why are Red-Black trees so popular? - Computer Science Stack Exchange

june 2019 by nhaliday

- AVL trees have smaller average depth than red-black trees, and thus searching for a value in AVL tree is consistently faster.

- Red-black trees make less structural changes to balance themselves than AVL trees, which could make them potentially faster for insert/delete. I'm saying potentially, because this would depend on the cost of the structural change to the tree, as this will depend a lot on the runtime and implemntation (might also be completely different in a functional language when the tree is immutable?)

There are many benchmarks online that compare AVL and Red-black trees, but what struck me is that my professor basically said, that usually you'd do one of two things:

- Either you don't really care that much about performance, in which case the 10-20% difference of AVL vs Red-black in most cases won't matter at all.

- Or you really care about performance, in which you case you'd ditch both AVL and Red-black trees, and go with B-trees, which can be tweaked to work much better (or (a,b)-trees, I'm gonna put all of those in one basket.)

--

> For some kinds of binary search trees, including red-black trees but not AVL trees, the "fixes" to the tree can fairly easily be predicted on the way down and performed during a single top-down pass, making the second pass unnecessary. Such insertion algorithms are typically implemented with a loop rather than recursion, and often run slightly faster in practice than their two-pass counterparts.

So a RedBlack tree insert can be implemented without recursion, on some CPUs recursion is very expensive if you overrun the function call cache (e.g SPARC due to is use of Register window)

--

There are some cases where you can't use B-trees at all.

One prominent case is std::map from C++ STL. The standard requires that insert does not invalidate existing iterators

...

I also believe that "single pass tail recursive" implementation is not the reason for red black tree popularity as a mutable data structure.

First of all, stack depth is irrelevant here, because (given log𝑛 height) you would run out of the main memory before you run out of stack space. Jemalloc is happy with preallocating worst case depth on the stack.

nibble
q-n-a
overflow
cs
algorithms
tcs
data-structures
functional
orders
trees
cost-benefit
tradeoffs
roots
explanans
impetus
performance
applicability-prereqs
programming
pls
c(pp)
ubiquity
- Red-black trees make less structural changes to balance themselves than AVL trees, which could make them potentially faster for insert/delete. I'm saying potentially, because this would depend on the cost of the structural change to the tree, as this will depend a lot on the runtime and implemntation (might also be completely different in a functional language when the tree is immutable?)

There are many benchmarks online that compare AVL and Red-black trees, but what struck me is that my professor basically said, that usually you'd do one of two things:

- Either you don't really care that much about performance, in which case the 10-20% difference of AVL vs Red-black in most cases won't matter at all.

- Or you really care about performance, in which you case you'd ditch both AVL and Red-black trees, and go with B-trees, which can be tweaked to work much better (or (a,b)-trees, I'm gonna put all of those in one basket.)

--

> For some kinds of binary search trees, including red-black trees but not AVL trees, the "fixes" to the tree can fairly easily be predicted on the way down and performed during a single top-down pass, making the second pass unnecessary. Such insertion algorithms are typically implemented with a loop rather than recursion, and often run slightly faster in practice than their two-pass counterparts.

So a RedBlack tree insert can be implemented without recursion, on some CPUs recursion is very expensive if you overrun the function call cache (e.g SPARC due to is use of Register window)

--

There are some cases where you can't use B-trees at all.

One prominent case is std::map from C++ STL. The standard requires that insert does not invalidate existing iterators

...

I also believe that "single pass tail recursive" implementation is not the reason for red black tree popularity as a mutable data structure.

First of all, stack depth is irrelevant here, because (given log𝑛 height) you would run out of the main memory before you run out of stack space. Jemalloc is happy with preallocating worst case depth on the stack.

june 2019 by nhaliday

LeetCode - The World's Leading Online Programming Learning Platform

june 2019 by nhaliday

very much targeted toward interview prep

https://www.quora.com/Is-LeetCode-Online-Judges-premium-membership-really-worth-it

This data is especially valuable because you get to know a company's interview style beforehand. For example, most questions that appeared in Facebook interviews have short solution typically not more than 30 lines of code. Their interview process focus on your ability to write clean, concise code. On the other hand, Google style interviews lean more on the analytical side and is algorithmic heavy, typically with multiple solutions to a question - each with a different run time complexity.

programming
tech
career
working-stiff
recruiting
interview-prep
algorithms
problem-solving
oly-programming
multi
q-n-a
qra
comparison
stylized-facts
facebook
google
cost-benefit
homo-hetero
startups
organization
alien-character
🖥
contest
puzzles
accretion
transitions
money-for-time
https://www.quora.com/Is-LeetCode-Online-Judges-premium-membership-really-worth-it

This data is especially valuable because you get to know a company's interview style beforehand. For example, most questions that appeared in Facebook interviews have short solution typically not more than 30 lines of code. Their interview process focus on your ability to write clean, concise code. On the other hand, Google style interviews lean more on the analytical side and is algorithmic heavy, typically with multiple solutions to a question - each with a different run time complexity.

june 2019 by nhaliday

Interview with Donald Knuth | Interview with Donald Knuth | InformIT

june 2019 by nhaliday

Andrew Binstock and Donald Knuth converse on the success of open source, the problem with multicore architecture, the disappointing lack of interest in literate programming, the menace of reusable code, and that urban legend about winning a programming contest with a single compilation.

Reusable vs. re-editable code: https://hal.archives-ouvertes.fr/hal-01966146/document

- Konrad Hinsen

https://www.johndcook.com/blog/2008/05/03/reusable-code-vs-re-editable-code/

I think whether code should be editable or in “an untouchable black box” depends on the number of developers involved, as well as their talent and motivation. Knuth is a highly motivated genius working in isolation. Most software is developed by large teams of programmers with varying degrees of motivation and talent. I think the further you move away from Knuth along these three axes the more important black boxes become.

nibble
interview
giants
expert-experience
programming
cs
software
contrarianism
carmack
oss
prediction
trends
linux
concurrency
desktop
comparison
checking
debugging
stories
engineering
hmm
idk
algorithms
books
debate
flux-stasis
duplication
parsimony
best-practices
writing
documentation
latex
intricacy
structure
hardware
caching
workflow
editors
composition-decomposition
coupling-cohesion
exposition
technical-writing
thinking
cracker-prog
code-organizing
grokkability
multi
techtariat
commentary
pdf
reflection
essay
examples
python
data-science
libraries
grokkability-clarity
project-management
Reusable vs. re-editable code: https://hal.archives-ouvertes.fr/hal-01966146/document

- Konrad Hinsen

https://www.johndcook.com/blog/2008/05/03/reusable-code-vs-re-editable-code/

I think whether code should be editable or in “an untouchable black box” depends on the number of developers involved, as well as their talent and motivation. Knuth is a highly motivated genius working in isolation. Most software is developed by large teams of programmers with varying degrees of motivation and talent. I think the further you move away from Knuth along these three axes the more important black boxes become.

june 2019 by nhaliday

Bareiss algorithm - Wikipedia

june 2019 by nhaliday

During the execution of Bareiss algorithm, every integer that is computed is the determinant of a submatrix of the input matrix. This allows, using the Hadamard inequality, to bound the size of these integers. Otherwise, the Bareiss algorithm may be viewed as a variant of Gaussian elimination and needs roughly the same number of arithmetic operations.

nibble
ground-up
cs
tcs
algorithms
complexity
linear-algebra
numerics
sci-comp
fields
calculation
nitty-gritty
june 2019 by nhaliday

algorithm, algorithmic, algorithmicx, algorithm2e, algpseudocode = confused - TeX - LaTeX Stack Exchange

june 2019 by nhaliday

algorithm2e is only one currently maintained, but answerer prefers style of algorithmicx, and after perusing the docs, so do I

q-n-a
stackex
libraries
list
recommendations
comparison
publishing
cs
programming
algorithms
tools
june 2019 by nhaliday

What every computer scientist should know about floating-point arithmetic

may 2019 by nhaliday

Floating-point arithmetic is considered as esoteric subject by many people. This is rather surprising, because floating-point is ubiquitous in computer systems: Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating point standard, and concludes with examples of how computer system builders can better support floating point.

Float Toy: http://evanw.github.io/float-toy/

https://news.ycombinator.com/item?id=22113485

https://stackoverflow.com/questions/2729637/does-epsilon-really-guarantees-anything-in-floating-point-computations

"you must use an epsilon when dealing with floats" is a knee-jerk reaction of programmers with a superficial understanding of floating-point computations, for comparisons in general (not only to zero).

This is usually unhelpful because it doesn't tell you how to minimize the propagation of rounding errors, it doesn't tell you how to avoid cancellation or absorption problems, and even when your problem is indeed related to the comparison of two floats, it doesn't tell you what value of epsilon is right for what you are doing.

...

Regarding the propagation of rounding errors, there exists specialized analyzers that can help you estimate it, because it is a tedious thing to do by hand.

https://www.di.ens.fr/~cousot/projects/DAEDALUS/synthetic_summary/CEA/Fluctuat/index.html

This was part of HW1 of CS24:

https://en.wikipedia.org/wiki/Kahan_summation_algorithm

In particular, simply summing n numbers in sequence has a worst-case error that grows proportional to n, and a root mean square error that grows as {\displaystyle {\sqrt {n}}} {\sqrt {n}} for random inputs (the roundoff errors form a random walk).[2] With compensated summation, the worst-case error bound is independent of n, so a large number of values can be summed with an error that only depends on the floating-point precision.[2]

cf:

https://en.wikipedia.org/wiki/Pairwise_summation

In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence.[1] Although there are other techniques such as Kahan summation that typically have even smaller round-off errors, pairwise summation is nearly as good (differing only by a logarithmic factor) while having much lower computational cost—it can be implemented so as to have nearly the same cost (and exactly the same number of arithmetic operations) as naive summation.

In particular, pairwise summation of a sequence of n numbers xn works by recursively breaking the sequence into two halves, summing each half, and adding the two sums: a divide and conquer algorithm. Its worst-case roundoff errors grow asymptotically as at most O(ε log n), where ε is the machine precision (assuming a fixed condition number, as discussed below).[1] In comparison, the naive technique of accumulating the sum in sequence (adding each xi one at a time for i = 1, ..., n) has roundoff errors that grow at worst as O(εn).[1] Kahan summation has a worst-case error of roughly O(ε), independent of n, but requires several times more arithmetic operations.[1] If the roundoff errors are random, and in particular have random signs, then they form a random walk and the error growth is reduced to an average of {\displaystyle O(\varepsilon {\sqrt {\log n}})} O(\varepsilon {\sqrt {\log n}}) for pairwise summation.[2]

A very similar recursive structure of summation is found in many fast Fourier transform (FFT) algorithms, and is responsible for the same slow roundoff accumulation of those FFTs.[2][3]

https://eng.libretexts.org/Bookshelves/Electrical_Engineering/Book%3A_Fast_Fourier_Transforms_(Burrus)/10%3A_Implementing_FFTs_in_Practice/10.8%3A_Numerical_Accuracy_in_FFTs

However, these encouraging error-growth rates only apply if the trigonometric “twiddle” factors in the FFT algorithm are computed very accurately. Many FFT implementations, including FFTW and common manufacturer-optimized libraries, therefore use precomputed tables of twiddle factors calculated by means of standard library functions (which compute trigonometric constants to roughly machine precision). The other common method to compute twiddle factors is to use a trigonometric recurrence formula—this saves memory (and cache), but almost all recurrences have errors that grow as O(n‾√) , O(n) or even O(n2) which lead to corresponding errors in the FFT.

...

There are, in fact, trigonometric recurrences with the same logarithmic error growth as the FFT, but these seem more difficult to implement efficiently; they require that a table of Θ(logn) values be stored and updated as the recurrence progresses. Instead, in order to gain at least some of the benefits of a trigonometric recurrence (reduced memory pressure at the expense of more arithmetic), FFTW includes several ways to compute a much smaller twiddle table, from which the desired entries can be computed accurately on the fly using a bounded number (usually <3) of complex multiplications. For example, instead of a twiddle table with n entries ωkn , FFTW can use two tables with Θ(n‾√) entries each, so that ωkn is computed by multiplying an entry in one table (indexed with the low-order bits of k ) by an entry in the other table (indexed with the high-order bits of k ).

[ed.: Nicholas Higham's "Accuracy and Stability of Numerical Algorithms" seems like a good reference for this kind of analysis.]

nibble
pdf
papers
programming
systems
numerics
nitty-gritty
intricacy
approximation
accuracy
types
sci-comp
multi
q-n-a
stackex
hmm
oly-programming
accretion
formal-methods
yak-shaving
wiki
reference
algorithms
yoga
ground-up
divide-and-conquer
fourier
books
tidbits
chart
caltech
nostalgia
dynamic
calculator
visualization
protocol-metadata
identity
Float Toy: http://evanw.github.io/float-toy/

https://news.ycombinator.com/item?id=22113485

https://stackoverflow.com/questions/2729637/does-epsilon-really-guarantees-anything-in-floating-point-computations

"you must use an epsilon when dealing with floats" is a knee-jerk reaction of programmers with a superficial understanding of floating-point computations, for comparisons in general (not only to zero).

This is usually unhelpful because it doesn't tell you how to minimize the propagation of rounding errors, it doesn't tell you how to avoid cancellation or absorption problems, and even when your problem is indeed related to the comparison of two floats, it doesn't tell you what value of epsilon is right for what you are doing.

...

Regarding the propagation of rounding errors, there exists specialized analyzers that can help you estimate it, because it is a tedious thing to do by hand.

https://www.di.ens.fr/~cousot/projects/DAEDALUS/synthetic_summary/CEA/Fluctuat/index.html

This was part of HW1 of CS24:

https://en.wikipedia.org/wiki/Kahan_summation_algorithm

In particular, simply summing n numbers in sequence has a worst-case error that grows proportional to n, and a root mean square error that grows as {\displaystyle {\sqrt {n}}} {\sqrt {n}} for random inputs (the roundoff errors form a random walk).[2] With compensated summation, the worst-case error bound is independent of n, so a large number of values can be summed with an error that only depends on the floating-point precision.[2]

cf:

https://en.wikipedia.org/wiki/Pairwise_summation

In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence.[1] Although there are other techniques such as Kahan summation that typically have even smaller round-off errors, pairwise summation is nearly as good (differing only by a logarithmic factor) while having much lower computational cost—it can be implemented so as to have nearly the same cost (and exactly the same number of arithmetic operations) as naive summation.

In particular, pairwise summation of a sequence of n numbers xn works by recursively breaking the sequence into two halves, summing each half, and adding the two sums: a divide and conquer algorithm. Its worst-case roundoff errors grow asymptotically as at most O(ε log n), where ε is the machine precision (assuming a fixed condition number, as discussed below).[1] In comparison, the naive technique of accumulating the sum in sequence (adding each xi one at a time for i = 1, ..., n) has roundoff errors that grow at worst as O(εn).[1] Kahan summation has a worst-case error of roughly O(ε), independent of n, but requires several times more arithmetic operations.[1] If the roundoff errors are random, and in particular have random signs, then they form a random walk and the error growth is reduced to an average of {\displaystyle O(\varepsilon {\sqrt {\log n}})} O(\varepsilon {\sqrt {\log n}}) for pairwise summation.[2]

A very similar recursive structure of summation is found in many fast Fourier transform (FFT) algorithms, and is responsible for the same slow roundoff accumulation of those FFTs.[2][3]

https://eng.libretexts.org/Bookshelves/Electrical_Engineering/Book%3A_Fast_Fourier_Transforms_(Burrus)/10%3A_Implementing_FFTs_in_Practice/10.8%3A_Numerical_Accuracy_in_FFTs

However, these encouraging error-growth rates only apply if the trigonometric “twiddle” factors in the FFT algorithm are computed very accurately. Many FFT implementations, including FFTW and common manufacturer-optimized libraries, therefore use precomputed tables of twiddle factors calculated by means of standard library functions (which compute trigonometric constants to roughly machine precision). The other common method to compute twiddle factors is to use a trigonometric recurrence formula—this saves memory (and cache), but almost all recurrences have errors that grow as O(n‾√) , O(n) or even O(n2) which lead to corresponding errors in the FFT.

...

There are, in fact, trigonometric recurrences with the same logarithmic error growth as the FFT, but these seem more difficult to implement efficiently; they require that a table of Θ(logn) values be stored and updated as the recurrence progresses. Instead, in order to gain at least some of the benefits of a trigonometric recurrence (reduced memory pressure at the expense of more arithmetic), FFTW includes several ways to compute a much smaller twiddle table, from which the desired entries can be computed accurately on the fly using a bounded number (usually <3) of complex multiplications. For example, instead of a twiddle table with n entries ωkn , FFTW can use two tables with Θ(n‾√) entries each, so that ωkn is computed by multiplying an entry in one table (indexed with the low-order bits of k ) by an entry in the other table (indexed with the high-order bits of k ).

[ed.: Nicholas Higham's "Accuracy and Stability of Numerical Algorithms" seems like a good reference for this kind of analysis.]

may 2019 by nhaliday

NSF/IEEE-TCPP Curriculum Initiative on Parallel and Distributed Computing -- Core Topics for Undergraduates | NSF/IEEE-TCPP Curriculum Initiative

nibble roadmap list links teaching programming algorithms distributed concurrency systems hardware accretion quixotic org:edu education higher-ed advanced

may 2019 by nhaliday

nibble roadmap list links teaching programming algorithms distributed concurrency systems hardware accretion quixotic org:edu education higher-ed advanced

may 2019 by nhaliday

Performance Engineering of Software Systems | Electrical Engineering and Computer Science | MIT OpenCourseWare

unit mit ocw programming engineering nitty-gritty ground-up cs algorithms distributed concurrency performance caching systems c(pp) jvm video lectures slides sci-comp advanced metal-to-virtual

may 2019 by nhaliday

unit mit ocw programming engineering nitty-gritty ground-up cs algorithms distributed concurrency performance caching systems c(pp) jvm video lectures slides sci-comp advanced metal-to-virtual

may 2019 by nhaliday

Iterated Distillation and Amplification – AI Alignment

nibble org:bleg acmtariat clever-rats ai ai-control alignment iteration-recursion deepgoog games research research-program algorithms explanation exposition summary reinforcement values machine-learning human-ml tradeoffs volo-avolo ratty

april 2018 by nhaliday

nibble org:bleg acmtariat clever-rats ai ai-control alignment iteration-recursion deepgoog games research research-program algorithms explanation exposition summary reinforcement values machine-learning human-ml tradeoffs volo-avolo ratty

april 2018 by nhaliday

The Hanson-Yudkowsky AI-Foom Debate - Machine Intelligence Research Institute

april 2018 by nhaliday

How Deviant Recent AI Progress Lumpiness?: http://www.overcomingbias.com/2018/03/how-deviant-recent-ai-progress-lumpiness.html

I seem to disagree with most people working on artificial intelligence (AI) risk. While with them I expect rapid change once AI is powerful enough to replace most all human workers, I expect this change to be spread across the world, not concentrated in one main localized AI system. The efforts of AI risk folks to design AI systems whose values won’t drift might stop global AI value drift if there is just one main AI system. But doing so in a world of many AI systems at similar abilities levels requires strong global governance of AI systems, which is a tall order anytime soon. Their continued focus on preventing single system drift suggests that they expect a single main AI system.

The main reason that I understand to expect relatively local AI progress is if AI progress is unusually lumpy, i.e., arriving in unusually fewer larger packages rather than in the usual many smaller packages. If one AI team finds a big lump, it might jump way ahead of the other teams.

However, we have a vast literature on the lumpiness of research and innovation more generally, which clearly says that usually most of the value in innovation is found in many small innovations. We have also so far seen this in computer science (CS) and AI. Even if there have been historical examples where much value was found in particular big innovations, such as nuclear weapons or the origin of humans.

Apparently many people associated with AI risk, including the star machine learning (ML) researchers that they often idolize, find it intuitively plausible that AI and ML progress is exceptionally lumpy. Such researchers often say, “My project is ‘huge’, and will soon do it all!” A decade ago my ex-co-blogger Eliezer Yudkowsky and I argued here on this blog about our differing estimates of AI progress lumpiness. He recently offered Alpha Go Zero as evidence of AI lumpiness:

...

In this post, let me give another example (beyond two big lumps in a row) of what could change my mind. I offer a clear observable indicator, for which data should have available now: deviant citation lumpiness in recent ML research. One standard measure of research impact is citations; bigger lumpier developments gain more citations that smaller ones. And it turns out that the lumpiness of citations is remarkably constant across research fields! See this March 3 paper in Science:

I Still Don’t Get Foom: http://www.overcomingbias.com/2014/07/30855.html

All of which makes it look like I’m the one with the problem; everyone else gets it. Even so, I’m gonna try to explain my problem again, in the hope that someone can explain where I’m going wrong. Here goes.

“Intelligence” just means an ability to do mental/calculation tasks, averaged over many tasks. I’ve always found it plausible that machines will continue to do more kinds of mental tasks better, and eventually be better at pretty much all of them. But what I’ve found it hard to accept is a “local explosion.” This is where a single machine, built by a single project using only a tiny fraction of world resources, goes in a short time (e.g., weeks) from being so weak that it is usually beat by a single human with the usual tools, to so powerful that it easily takes over the entire world. Yes, smarter machines may greatly increase overall economic growth rates, and yes such growth may be uneven. But this degree of unevenness seems implausibly extreme. Let me explain.

If we count by economic value, humans now do most of the mental tasks worth doing. Evolution has given us a brain chock-full of useful well-honed modules. And the fact that most mental tasks require the use of many modules is enough to explain why some of us are smarter than others. (There’d be a common “g” factor in task performance even with independent module variation.) Our modules aren’t that different from those of other primates, but because ours are different enough to allow lots of cultural transmission of innovation, we’ve out-competed other primates handily.

We’ve had computers for over seventy years, and have slowly build up libraries of software modules for them. Like brains, computers do mental tasks by combining modules. An important mental task is software innovation: improving these modules, adding new ones, and finding new ways to combine them. Ideas for new modules are sometimes inspired by the modules we see in our brains. When an innovation team finds an improvement, they usually sell access to it, which gives them resources for new projects, and lets others take advantage of their innovation.

...

In Bostrom’s graph above the line for an initially small project and system has a much higher slope, which means that it becomes in a short time vastly better at software innovation. Better than the entire rest of the world put together. And my key question is: how could it plausibly do that? Since the rest of the world is already trying the best it can to usefully innovate, and to abstract to promote such innovation, what exactly gives one small project such a huge advantage to let it innovate so much faster?

...

In fact, most software innovation seems to be driven by hardware advances, instead of innovator creativity. Apparently, good ideas are available but must usually wait until hardware is cheap enough to support them.

Yes, sometimes architectural choices have wider impacts. But I was an artificial intelligence researcher for nine years, ending twenty years ago, and I never saw an architecture choice make a huge difference, relative to other reasonable architecture choices. For most big systems, overall architecture matters a lot less than getting lots of detail right. Researchers have long wandered the space of architectures, mostly rediscovering variations on what others found before.

Some hope that a small project could be much better at innovation because it specializes in that topic, and much better understands new theoretical insights into the basic nature of innovation or intelligence. But I don’t think those are actually topics where one can usefully specialize much, or where we’ll find much useful new theory. To be much better at learning, the project would instead have to be much better at hundreds of specific kinds of learning. Which is very hard to do in a small project.

What does Bostrom say? Alas, not much. He distinguishes several advantages of digital over human minds, but all software shares those advantages. Bostrom also distinguishes five paths: better software, brain emulation (i.e., ems), biological enhancement of humans, brain-computer interfaces, and better human organizations. He doesn’t think interfaces would work, and sees organizations and better biology as only playing supporting roles.

...

Similarly, while you might imagine someday standing in awe in front of a super intelligence that embodies all the power of a new age, superintelligence just isn’t the sort of thing that one project could invent. As “intelligence” is just the name we give to being better at many mental tasks by using many good mental modules, there’s no one place to improve it. So I can’t see a plausible way one project could increase its intelligence vastly faster than could the rest of the world.

Takeoff speeds: https://sideways-view.com/2018/02/24/takeoff-speeds/

Futurists have argued for years about whether the development of AGI will look more like a breakthrough within a small group (“fast takeoff”), or a continuous acceleration distributed across the broader economy or a large firm (“slow takeoff”).

I currently think a slow takeoff is significantly more likely. This post explains some of my reasoning and why I think it matters. Mostly the post lists arguments I often hear for a fast takeoff and explains why I don’t find them compelling.

(Note: this is not a post about whether an intelligence explosion will occur. That seems very likely to me. Quantitatively I expect it to go along these lines. So e.g. while I disagree with many of the claims and assumptions in Intelligence Explosion Microeconomics, I don’t disagree with the central thesis or with most of the arguments.)

ratty
lesswrong
subculture
miri-cfar
ai
risk
ai-control
futurism
books
debate
hanson
big-yud
prediction
contrarianism
singularity
local-global
speed
speedometer
time
frontier
distribution
smoothness
shift
pdf
economics
track-record
abstraction
analogy
links
wiki
list
evolution
mutation
selection
optimization
search
iteration-recursion
intelligence
metameta
chart
analysis
number
ems
coordination
cooperate-defect
death
values
formal-values
flux-stasis
philosophy
farmers-and-foragers
malthus
scale
studying
innovation
insight
conceptual-vocab
growth-econ
egalitarianism-hierarchy
inequality
authoritarianism
wealth
near-far
rationality
epistemic
biases
cycles
competition
arms
zero-positive-sum
deterrence
war
peace-violence
winner-take-all
technology
moloch
multi
plots
research
science
publishing
humanity
labor
marginal
urban-rural
structure
composition-decomposition
complex-systems
gregory-clark
decentralized
heavy-industry
magnitude
multiplicative
endogenous-exogenous
models
uncertainty
decision-theory
time-prefer
I seem to disagree with most people working on artificial intelligence (AI) risk. While with them I expect rapid change once AI is powerful enough to replace most all human workers, I expect this change to be spread across the world, not concentrated in one main localized AI system. The efforts of AI risk folks to design AI systems whose values won’t drift might stop global AI value drift if there is just one main AI system. But doing so in a world of many AI systems at similar abilities levels requires strong global governance of AI systems, which is a tall order anytime soon. Their continued focus on preventing single system drift suggests that they expect a single main AI system.

The main reason that I understand to expect relatively local AI progress is if AI progress is unusually lumpy, i.e., arriving in unusually fewer larger packages rather than in the usual many smaller packages. If one AI team finds a big lump, it might jump way ahead of the other teams.

However, we have a vast literature on the lumpiness of research and innovation more generally, which clearly says that usually most of the value in innovation is found in many small innovations. We have also so far seen this in computer science (CS) and AI. Even if there have been historical examples where much value was found in particular big innovations, such as nuclear weapons or the origin of humans.

Apparently many people associated with AI risk, including the star machine learning (ML) researchers that they often idolize, find it intuitively plausible that AI and ML progress is exceptionally lumpy. Such researchers often say, “My project is ‘huge’, and will soon do it all!” A decade ago my ex-co-blogger Eliezer Yudkowsky and I argued here on this blog about our differing estimates of AI progress lumpiness. He recently offered Alpha Go Zero as evidence of AI lumpiness:

...

In this post, let me give another example (beyond two big lumps in a row) of what could change my mind. I offer a clear observable indicator, for which data should have available now: deviant citation lumpiness in recent ML research. One standard measure of research impact is citations; bigger lumpier developments gain more citations that smaller ones. And it turns out that the lumpiness of citations is remarkably constant across research fields! See this March 3 paper in Science:

I Still Don’t Get Foom: http://www.overcomingbias.com/2014/07/30855.html

All of which makes it look like I’m the one with the problem; everyone else gets it. Even so, I’m gonna try to explain my problem again, in the hope that someone can explain where I’m going wrong. Here goes.

“Intelligence” just means an ability to do mental/calculation tasks, averaged over many tasks. I’ve always found it plausible that machines will continue to do more kinds of mental tasks better, and eventually be better at pretty much all of them. But what I’ve found it hard to accept is a “local explosion.” This is where a single machine, built by a single project using only a tiny fraction of world resources, goes in a short time (e.g., weeks) from being so weak that it is usually beat by a single human with the usual tools, to so powerful that it easily takes over the entire world. Yes, smarter machines may greatly increase overall economic growth rates, and yes such growth may be uneven. But this degree of unevenness seems implausibly extreme. Let me explain.

If we count by economic value, humans now do most of the mental tasks worth doing. Evolution has given us a brain chock-full of useful well-honed modules. And the fact that most mental tasks require the use of many modules is enough to explain why some of us are smarter than others. (There’d be a common “g” factor in task performance even with independent module variation.) Our modules aren’t that different from those of other primates, but because ours are different enough to allow lots of cultural transmission of innovation, we’ve out-competed other primates handily.

We’ve had computers for over seventy years, and have slowly build up libraries of software modules for them. Like brains, computers do mental tasks by combining modules. An important mental task is software innovation: improving these modules, adding new ones, and finding new ways to combine them. Ideas for new modules are sometimes inspired by the modules we see in our brains. When an innovation team finds an improvement, they usually sell access to it, which gives them resources for new projects, and lets others take advantage of their innovation.

...

In Bostrom’s graph above the line for an initially small project and system has a much higher slope, which means that it becomes in a short time vastly better at software innovation. Better than the entire rest of the world put together. And my key question is: how could it plausibly do that? Since the rest of the world is already trying the best it can to usefully innovate, and to abstract to promote such innovation, what exactly gives one small project such a huge advantage to let it innovate so much faster?

...

In fact, most software innovation seems to be driven by hardware advances, instead of innovator creativity. Apparently, good ideas are available but must usually wait until hardware is cheap enough to support them.

Yes, sometimes architectural choices have wider impacts. But I was an artificial intelligence researcher for nine years, ending twenty years ago, and I never saw an architecture choice make a huge difference, relative to other reasonable architecture choices. For most big systems, overall architecture matters a lot less than getting lots of detail right. Researchers have long wandered the space of architectures, mostly rediscovering variations on what others found before.

Some hope that a small project could be much better at innovation because it specializes in that topic, and much better understands new theoretical insights into the basic nature of innovation or intelligence. But I don’t think those are actually topics where one can usefully specialize much, or where we’ll find much useful new theory. To be much better at learning, the project would instead have to be much better at hundreds of specific kinds of learning. Which is very hard to do in a small project.

What does Bostrom say? Alas, not much. He distinguishes several advantages of digital over human minds, but all software shares those advantages. Bostrom also distinguishes five paths: better software, brain emulation (i.e., ems), biological enhancement of humans, brain-computer interfaces, and better human organizations. He doesn’t think interfaces would work, and sees organizations and better biology as only playing supporting roles.

...

Similarly, while you might imagine someday standing in awe in front of a super intelligence that embodies all the power of a new age, superintelligence just isn’t the sort of thing that one project could invent. As “intelligence” is just the name we give to being better at many mental tasks by using many good mental modules, there’s no one place to improve it. So I can’t see a plausible way one project could increase its intelligence vastly faster than could the rest of the world.

Takeoff speeds: https://sideways-view.com/2018/02/24/takeoff-speeds/

Futurists have argued for years about whether the development of AGI will look more like a breakthrough within a small group (“fast takeoff”), or a continuous acceleration distributed across the broader economy or a large firm (“slow takeoff”).

I currently think a slow takeoff is significantly more likely. This post explains some of my reasoning and why I think it matters. Mostly the post lists arguments I often hear for a fast takeoff and explains why I don’t find them compelling.

(Note: this is not a post about whether an intelligence explosion will occur. That seems very likely to me. Quantitatively I expect it to go along these lines. So e.g. while I disagree with many of the claims and assumptions in Intelligence Explosion Microeconomics, I don’t disagree with the central thesis or with most of the arguments.)

april 2018 by nhaliday

Brains and backprop: a key timeline crux

ratty lesswrong todo humanity deep-learning machine-learning gradient-descent neuro neuro-nitgrit comparison ai risk track-record speedometer frontier time evolution nature algorithms reduction unsupervised discrete smoothness debate links regularization random expert-experience analogy similarity deepgoog error cost-benefit economics language developmental clever-rats acmtariat discussion state-of-art generative ai-control science engineering approximation iteration-recursion models heuristic reinforcement nibble crux atoms coarse-fine experiment empirical supply-demand markets insight land number games nitty-gritty survey

march 2018 by nhaliday

ratty lesswrong todo humanity deep-learning machine-learning gradient-descent neuro neuro-nitgrit comparison ai risk track-record speedometer frontier time evolution nature algorithms reduction unsupervised discrete smoothness debate links regularization random expert-experience analogy similarity deepgoog error cost-benefit economics language developmental clever-rats acmtariat discussion state-of-art generative ai-control science engineering approximation iteration-recursion models heuristic reinforcement nibble crux atoms coarse-fine experiment empirical supply-demand markets insight land number games nitty-gritty survey

march 2018 by nhaliday

Information Processing: US Needs a National AI Strategy: A Sputnik Moment?

february 2018 by nhaliday

FT podcasts on US-China competition and AI: http://infoproc.blogspot.com/2018/05/ft-podcasts-on-us-china-competition-and.html

A new recommended career path for effective altruists: China specialist: https://80000hours.org/articles/china-careers/

Our rough guess is that it would be useful for there to be at least ten people in the community with good knowledge in this area within the next few years.

By “good knowledge” we mean they’ve spent at least 3 years studying these topics and/or living in China.

We chose ten because that would be enough for several people to cover each of the major areas listed (e.g. 4 within AI, 2 within biorisk, 2 within foreign relations, 1 in another area).

AI Policy and Governance Internship: https://www.fhi.ox.ac.uk/ai-policy-governance-internship/

https://www.fhi.ox.ac.uk/deciphering-chinas-ai-dream/

https://www.fhi.ox.ac.uk/wp-content/uploads/Deciphering_Chinas_AI-Dream.pdf

Deciphering China’s AI Dream

The context, components, capabilities, and consequences of

China’s strategy to lead the world in AI

Europe’s AI delusion: https://www.politico.eu/article/opinion-europes-ai-delusion/

Brussels is failing to grasp threats and opportunities of artificial intelligence.

By BRUNO MAÇÃES

When the computer program AlphaGo beat the Chinese professional Go player Ke Jie in a three-part match, it didn’t take long for Beijing to realize the implications.

If algorithms can already surpass the abilities of a master Go player, it can’t be long before they will be similarly supreme in the activity to which the classic board game has always been compared: war.

As I’ve written before, the great conflict of our time is about who can control the next wave of technological development: the widespread application of artificial intelligence in the economic and military spheres.

...

If China’s ambitions sound plausible, that’s because the country’s achievements in deep learning are so impressive already. After Microsoft announced that its speech recognition software surpassed human-level language recognition in October 2016, Andrew Ng, then head of research at Baidu, tweeted: “We had surpassed human-level Chinese recognition in 2015; happy to see Microsoft also get there for English less than a year later.”

...

One obvious advantage China enjoys is access to almost unlimited pools of data. The machine-learning technologies boosting the current wave of AI expansion are as good as the amount of data they can use. That could be the number of people driving cars, photos labeled on the internet or voice samples for translation apps. With 700 or 800 million Chinese internet users and fewer data protection rules, China is as rich in data as the Gulf States are in oil.

How can Europe and the United States compete? They will have to be commensurately better in developing algorithms and computer power. Sadly, Europe is falling behind in these areas as well.

...

Chinese commentators have embraced the idea of a coming singularity: the moment when AI surpasses human ability. At that point a number of interesting things happen. First, future AI development will be conducted by AI itself, creating exponential feedback loops. Second, humans will become useless for waging war. At that point, the human mind will be unable to keep pace with robotized warfare. With advanced image recognition, data analytics, prediction systems, military brain science and unmanned systems, devastating wars might be waged and won in a matter of minutes.

...

The argument in the new strategy is fully defensive. It first considers how AI raises new threats and then goes on to discuss the opportunities. The EU and Chinese strategies follow opposite logics. Already on its second page, the text frets about the legal and ethical problems raised by AI and discusses the “legitimate concerns” the technology generates.

The EU’s strategy is organized around three concerns: the need to boost Europe’s AI capacity, ethical issues and social challenges. Unfortunately, even the first dimension quickly turns out to be about “European values” and the need to place “the human” at the center of AI — forgetting that the first word in AI is not “human” but “artificial.”

https://twitter.com/mr_scientism/status/983057591298351104

https://archive.is/m3Njh

US military: "LOL, China thinks it's going to be a major player in AI, but we've got all the top AI researchers. You guys will help us develop weapons, right?"

US AI researchers: "No."

US military: "But... maybe just a computer vision app."

US AI researchers: "NO."

https://www.theverge.com/2018/4/4/17196818/ai-boycot-killer-robots-kaist-university-hanwha

https://www.nytimes.com/2018/04/04/technology/google-letter-ceo-pentagon-project.html

https://twitter.com/mr_scientism/status/981685030417326080

https://archive.is/3wbHm

AI-risk was a mistake.

hsu
scitariat
commentary
video
presentation
comparison
usa
china
asia
sinosphere
frontier
technology
science
ai
speedometer
innovation
google
barons
deepgoog
stories
white-paper
strategy
migration
iran
human-capital
corporation
creative
alien-character
military
human-ml
nationalism-globalism
security
investing
government
games
deterrence
defense
nuclear
arms
competition
risk
ai-control
musk
optimism
multi
news
org:mag
europe
EU
80000-hours
effective-altruism
proposal
article
realness
offense-defense
war
biotech
altruism
language
foreign-lang
philosophy
the-great-west-whale
enhancement
foreign-policy
geopolitics
anglo
jobs
career
planning
hmm
travel
charity
tech
intel
media
teaching
tutoring
russia
india
miri-cfar
pdf
automation
class
labor
polisci
society
trust
n-factor
corruption
leviathan
ethics
authoritarianism
individualism-collectivism
revolution
economics
inequality
civic
law
regulation
data
scale
pro-rata
capital
zero-positive-sum
cooperate-defect
distribution
time-series
tre
A new recommended career path for effective altruists: China specialist: https://80000hours.org/articles/china-careers/

Our rough guess is that it would be useful for there to be at least ten people in the community with good knowledge in this area within the next few years.

By “good knowledge” we mean they’ve spent at least 3 years studying these topics and/or living in China.

We chose ten because that would be enough for several people to cover each of the major areas listed (e.g. 4 within AI, 2 within biorisk, 2 within foreign relations, 1 in another area).

AI Policy and Governance Internship: https://www.fhi.ox.ac.uk/ai-policy-governance-internship/

https://www.fhi.ox.ac.uk/deciphering-chinas-ai-dream/

https://www.fhi.ox.ac.uk/wp-content/uploads/Deciphering_Chinas_AI-Dream.pdf

Deciphering China’s AI Dream

The context, components, capabilities, and consequences of

China’s strategy to lead the world in AI

Europe’s AI delusion: https://www.politico.eu/article/opinion-europes-ai-delusion/

Brussels is failing to grasp threats and opportunities of artificial intelligence.

By BRUNO MAÇÃES

When the computer program AlphaGo beat the Chinese professional Go player Ke Jie in a three-part match, it didn’t take long for Beijing to realize the implications.

If algorithms can already surpass the abilities of a master Go player, it can’t be long before they will be similarly supreme in the activity to which the classic board game has always been compared: war.

As I’ve written before, the great conflict of our time is about who can control the next wave of technological development: the widespread application of artificial intelligence in the economic and military spheres.

...

If China’s ambitions sound plausible, that’s because the country’s achievements in deep learning are so impressive already. After Microsoft announced that its speech recognition software surpassed human-level language recognition in October 2016, Andrew Ng, then head of research at Baidu, tweeted: “We had surpassed human-level Chinese recognition in 2015; happy to see Microsoft also get there for English less than a year later.”

...

One obvious advantage China enjoys is access to almost unlimited pools of data. The machine-learning technologies boosting the current wave of AI expansion are as good as the amount of data they can use. That could be the number of people driving cars, photos labeled on the internet or voice samples for translation apps. With 700 or 800 million Chinese internet users and fewer data protection rules, China is as rich in data as the Gulf States are in oil.

How can Europe and the United States compete? They will have to be commensurately better in developing algorithms and computer power. Sadly, Europe is falling behind in these areas as well.

...

Chinese commentators have embraced the idea of a coming singularity: the moment when AI surpasses human ability. At that point a number of interesting things happen. First, future AI development will be conducted by AI itself, creating exponential feedback loops. Second, humans will become useless for waging war. At that point, the human mind will be unable to keep pace with robotized warfare. With advanced image recognition, data analytics, prediction systems, military brain science and unmanned systems, devastating wars might be waged and won in a matter of minutes.

...

The argument in the new strategy is fully defensive. It first considers how AI raises new threats and then goes on to discuss the opportunities. The EU and Chinese strategies follow opposite logics. Already on its second page, the text frets about the legal and ethical problems raised by AI and discusses the “legitimate concerns” the technology generates.

The EU’s strategy is organized around three concerns: the need to boost Europe’s AI capacity, ethical issues and social challenges. Unfortunately, even the first dimension quickly turns out to be about “European values” and the need to place “the human” at the center of AI — forgetting that the first word in AI is not “human” but “artificial.”

https://twitter.com/mr_scientism/status/983057591298351104

https://archive.is/m3Njh

US military: "LOL, China thinks it's going to be a major player in AI, but we've got all the top AI researchers. You guys will help us develop weapons, right?"

US AI researchers: "No."

US military: "But... maybe just a computer vision app."

US AI researchers: "NO."

https://www.theverge.com/2018/4/4/17196818/ai-boycot-killer-robots-kaist-university-hanwha

https://www.nytimes.com/2018/04/04/technology/google-letter-ceo-pentagon-project.html

https://twitter.com/mr_scientism/status/981685030417326080

https://archive.is/3wbHm

AI-risk was a mistake.

february 2018 by nhaliday

Rank aggregation basics: Local Kemeny optimisation | David R. MacIver

september 2017 by nhaliday

This turns our problem from a global search to a local one: Basically we can start from any point in the search space and search locally by swapping adjacent pairs until we hit a minimum. This turns out to be quite easy to do. _We basically run insertion sort_: At step n we have the first n items in a locally Kemeny optimal order. Swap the n+1th item backwards until the majority think its predecessor is < it. This ensures all adjacent pairs are in the majority order, so swapping them would result in a greater than or equal K. This is of course an O(n^2) algorithm. In fact, the problem of merely finding a locally Kemeny optimal solution can be done in O(n log(n)) (for much the same reason as you can sort better than insertion sort). You just take the directed graph of majority votes and find a Hamiltonian Path. The nice thing about the above version of the algorithm is that it gives you a lot of control over where you start your search.

techtariat
liner-notes
papers
tcs
algorithms
machine-learning
acm
optimization
approximation
local-global
orders
graphs
graph-theory
explanation
iteration-recursion
time-complexity
nibble
september 2017 by nhaliday

Algorithms for Finding Nearest Neighbors (and Relatives) - Piotr Indyk

september 2017 by nhaliday

kd-tree mostly

pdf
nibble
slides
talks
mit
tcs
algorithms
data-structures
geometry
spatial
metric-space
mihai
yoga
rand-approx
approximation
high-dimension
dimensionality
hashing
embeddings
trees
contiguity-proximity
september 2017 by nhaliday

Kelly criterion - Wikipedia

august 2017 by nhaliday

In probability theory and intertemporal portfolio choice, the Kelly criterion, Kelly strategy, Kelly formula, or Kelly bet, is a formula used to determine the optimal size of a series of bets. In most gambling scenarios, and some investing scenarios under some simplifying assumptions, the Kelly strategy will do better than any essentially different strategy in the long run (that is, over a span of time in which the observed fraction of bets that are successful equals the probability that any given bet will be successful). It was described by J. L. Kelly, Jr, a researcher at Bell Labs, in 1956.[1] The practical use of the formula has been demonstrated.[2][3][4]

The Kelly Criterion is to bet a predetermined fraction of assets and can be counterintuitive. In one study,[5][6] each participant was given $25 and asked to bet on a coin that would land heads 60% of the time. Participants had 30 minutes to play, so could place about 300 bets, and the prizes were capped at $250. Behavior was far from optimal. "Remarkably, 28% of the participants went bust, and the average payout was just $91. Only 21% of the participants reached the maximum. 18 of the 61 participants bet everything on one toss, while two-thirds gambled on tails at some stage in the experiment." Using the Kelly criterion and based on the odds in the experiment, the right approach would be to bet 20% of the pot on each throw (see first example in Statement below). If losing, the size of the bet gets cut; if winning, the stake increases.

nibble
betting
investing
ORFE
acm
checklists
levers
probability
algorithms
wiki
reference
atoms
extrema
parsimony
tidbits
decision-theory
decision-making
street-fighting
mental-math
calculation
The Kelly Criterion is to bet a predetermined fraction of assets and can be counterintuitive. In one study,[5][6] each participant was given $25 and asked to bet on a coin that would land heads 60% of the time. Participants had 30 minutes to play, so could place about 300 bets, and the prizes were capped at $250. Behavior was far from optimal. "Remarkably, 28% of the participants went bust, and the average payout was just $91. Only 21% of the participants reached the maximum. 18 of the 61 participants bet everything on one toss, while two-thirds gambled on tails at some stage in the experiment." Using the Kelly criterion and based on the odds in the experiment, the right approach would be to bet 20% of the pot on each throw (see first example in Statement below). If losing, the size of the bet gets cut; if winning, the stake increases.

august 2017 by nhaliday

Talks

may 2017 by nhaliday

Quantum Supremacy: Office of Science and Technology Policy QIS Forum, Eisenhower Executive Office Building, White House Complex, Washington DC, October 18, 2016. Another version at UTCS Faculty Lunch, October 26, 2016. Another version at UT Austin Physics Colloquium, Austin, TX, November 9, 2016.

Complexity-Theoretic Foundations of Quantum Supremacy Experiments: Quantum Algorithms Workshop, Aspen Center for Physics, Aspen, CO, March 25, 2016

When Exactly Do Quantum Computers Provide A Speedup?: Yale Quantum Institute Seminar, Yale University, New Haven, CT, October 10, 2014. Another version at UT Austin Physics Colloquium, Austin, TX, November 19, 2014; Applied and Interdisciplinary Mathematics Seminar, Northeastern University, Boston, MA, November 25, 2014; Hebrew University Physics Colloquium, Jerusalem, Israel, January 5, 2015; Computer Science Colloquium, Technion, Haifa, Israel, January 8, 2015; Stanford University Physics Colloquium, January 27, 2015

tcstariat
aaronson
tcs
complexity
quantum
quantum-info
talks
list
slides
accretion
algorithms
applications
physics
nibble
frontier
computation
volo-avolo
speedometer
questions
Complexity-Theoretic Foundations of Quantum Supremacy Experiments: Quantum Algorithms Workshop, Aspen Center for Physics, Aspen, CO, March 25, 2016

When Exactly Do Quantum Computers Provide A Speedup?: Yale Quantum Institute Seminar, Yale University, New Haven, CT, October 10, 2014. Another version at UT Austin Physics Colloquium, Austin, TX, November 19, 2014; Applied and Interdisciplinary Mathematics Seminar, Northeastern University, Boston, MA, November 25, 2014; Hebrew University Physics Colloquium, Jerusalem, Israel, January 5, 2015; Computer Science Colloquium, Technion, Haifa, Israel, January 8, 2015; Stanford University Physics Colloquium, January 27, 2015

may 2017 by nhaliday

Strings, periods, and borders

may 2017 by nhaliday

A border of x is any proper prefix of x that equals a suffix of x.

...overlapping borders of a string imply that the string is periodic...

In the border array ß[1..n] of x, entry ß[i] is the length

of the longest border of x[1..i].

pdf
nibble
slides
lectures
algorithms
strings
exposition
yoga
atoms
levers
tidbits
sequential
backup
...overlapping borders of a string imply that the string is periodic...

In the border array ß[1..n] of x, entry ß[i] is the length

of the longest border of x[1..i].

may 2017 by nhaliday

How do I implement Locality-Sensitive Hashing with respect to Levenshtein distance? - Quora

april 2017 by nhaliday

A: embed metric in Euclidean space

q-n-a
qra
metric-space
programming
engineering
strings
similarity
hashing
algorithms
nibble
applications
duplication
embeddings
tidbits
measure
features
april 2017 by nhaliday

Discrepancy algorithm inspired by gradient descent and multiplicative weights; after Levy, Ramadas and Rothvoss | I’m a bandit

acmtariat sebastien-bubeck org:bleg nibble exposition liner-notes algorithms math.CO online-learning tcs research combo-optimization gradient-descent linear-algebra amortization-potential

april 2017 by nhaliday

acmtariat sebastien-bubeck org:bleg nibble exposition liner-notes algorithms math.CO online-learning tcs research combo-optimization gradient-descent linear-algebra amortization-potential

april 2017 by nhaliday

Main Page - Competitive Programming Algorithms: E-Maxx Algorithms in English

february 2017 by nhaliday

original russian version: http://e-maxx.ru/algo/

some notable stuff:

- O(N) factorization sieve

- discrete logarithm

- factorial N! (mod P) in O(P log N)

- flow algorithms

- enumerating submasks

- bridges, articulation points

- Ukkonen algorithm

- sqrt(N) trick, eg, for range mode query

explanation
programming
algorithms
russia
foreign-lang
oly
oly-programming
problem-solving
accretion
math.NT
graphs
graph-theory
optimization
data-structures
yoga
tidbits
multi
anglo
language
arrows
strings
some notable stuff:

- O(N) factorization sieve

- discrete logarithm

- factorial N! (mod P) in O(P log N)

- flow algorithms

- enumerating submasks

- bridges, articulation points

- Ukkonen algorithm

- sqrt(N) trick, eg, for range mode query

february 2017 by nhaliday

mg.metric geometry - What is the best way to peel fruit? - MathOverflow

q-n-a overflow nibble math acm sublinear metrics metric-space proofs math.CO tcstariat arrows reduction measure math.MG similarity multi papers survey computational-geometry cs algorithms pdf positivity msr tidbits intersection curvature convexity-curvature intersection-connectedness signum

february 2017 by nhaliday

q-n-a overflow nibble math acm sublinear metrics metric-space proofs math.CO tcstariat arrows reduction measure math.MG similarity multi papers survey computational-geometry cs algorithms pdf positivity msr tidbits intersection curvature convexity-curvature intersection-connectedness signum

february 2017 by nhaliday

inequalities - Is the Jaccard distance a distance? - MathOverflow

february 2017 by nhaliday

Steinhaus Transform

the referenced survey: http://kenclarkson.org/nn_survey/p.pdf

It's known that this transformation produces a metric from a metric. Now if you take as the base metric D the symmetric difference between two sets, what you end up with is the Jaccard distance (which actually is known by many other names as well).

q-n-a
overflow
nibble
math
acm
sublinear
metrics
metric-space
proofs
math.CO
tcstariat
arrows
reduction
measure
math.MG
similarity
multi
papers
survey
computational-geometry
cs
algorithms
pdf
positivity
msr
tidbits
intersection
curvature
convexity-curvature
intersection-connectedness
signum
the referenced survey: http://kenclarkson.org/nn_survey/p.pdf

It's known that this transformation produces a metric from a metric. Now if you take as the base metric D the symmetric difference between two sets, what you end up with is the Jaccard distance (which actually is known by many other names as well).

february 2017 by nhaliday

big list - Overarching reasons why problems are in P or BPP - Theoretical Computer Science Stack Exchange

q-n-a overflow nibble tcs complexity algorithms linear-algebra polynomials markov monte-carlo DP math.CO greedy math.NT synthesis list big-list hi-order-bits big-picture aaronson tcstariat graphs graph-theory proofs structure tricki yoga mathtariat time-complexity top-n metabuch metameta skeleton s:*** chart knowledge curvature convexity-curvature

february 2017 by nhaliday

q-n-a overflow nibble tcs complexity algorithms linear-algebra polynomials markov monte-carlo DP math.CO greedy math.NT synthesis list big-list hi-order-bits big-picture aaronson tcstariat graphs graph-theory proofs structure tricki yoga mathtariat time-complexity top-n metabuch metameta skeleton s:*** chart knowledge curvature convexity-curvature

february 2017 by nhaliday

MinHash - Wikipedia

february 2017 by nhaliday

- goal: compute Jaccard coefficient J(A, B) = |A∩B| / |A∪B| in sublinear space

- idea: pick random injective hash function h, define h_min(S) = argmin_{x in S} h(x), and note that Pr[h_min(A) = h_min(B)] = J(A, B)

- reduce variance w/ Chernoff bound

algorithms
data-structures
sublinear
hashing
wiki
reference
random
tcs
nibble
measure
metric-space
metrics
similarity
PAC
intersection
intersection-connectedness
- idea: pick random injective hash function h, define h_min(S) = argmin_{x in S} h(x), and note that Pr[h_min(A) = h_min(B)] = J(A, B)

- reduce variance w/ Chernoff bound

february 2017 by nhaliday

Count–min sketch - Wikipedia

february 2017 by nhaliday

- estimates frequency vector (f_i)

- idea:

d = O(log 1/δ) hash functions h_j: [n] -> [w] (w = O(1/ε))

d*w counters a[r, c]

for each event i, increment counters a[1, h_1(i)], a[2, h_2(i)], ..., a[d, h_d(i)]

estimate for f_i is min_j a[j, h_j(i)]

- never underestimates but upward-biased

- pf: Markov to get constant probability of success, then exponential decrease with repetition

lecture notes: http://theory.stanford.edu/~tim/s15/l/l2.pdf

- note this can work w/ negative updates. just use median instead of min. pf still uses markov on the absolute value of error.

algorithms
data-structures
sublinear
hashing
wiki
reference
bias-variance
approximation
random
tcs
multi
stanford
lecture-notes
pdf
tim-roughgarden
nibble
pigeonhole-markov
PAC
- idea:

d = O(log 1/δ) hash functions h_j: [n] -> [w] (w = O(1/ε))

d*w counters a[r, c]

for each event i, increment counters a[1, h_1(i)], a[2, h_2(i)], ..., a[d, h_d(i)]

estimate for f_i is min_j a[j, h_j(i)]

- never underestimates but upward-biased

- pf: Markov to get constant probability of success, then exponential decrease with repetition

lecture notes: http://theory.stanford.edu/~tim/s15/l/l2.pdf

- note this can work w/ negative updates. just use median instead of min. pf still uses markov on the absolute value of error.

february 2017 by nhaliday

Lecture 11

january 2017 by nhaliday

In which we prove that the Edmonds-Karp algorithm for maximum flow is a strongly polynomial time algorithm, and we begin to talk about the push-relabel approach.

pdf
lecture-notes
exposition
optimization
algorithms
linear-programming
graphs
stanford
luca-trevisan
nibble
direction
stock-flow
tcs
constraint-satisfaction
tcstariat
january 2017 by nhaliday

Lecture 16

january 2017 by nhaliday

In which we define a multi-commodity flow problem, and we see that its dual is the relaxation of a useful graph partitioning problem. The relaxation can be rounded to yield an approximate graph partitioning algorithm.

pdf
lecture-notes
exposition
optimization
linear-programming
graphs
graph-theory
algorithms
duality
rounding
stanford
approximation
rand-approx
luca-trevisan
relaxation
nibble
stock-flow
constraint-satisfaction
tcs
tcstariat
january 2017 by nhaliday

computational complexity - What is the easiest randomized algorithm to motivate to the layperson? - MathOverflow

january 2017 by nhaliday

- volume of shape in R^n

- polynomial identity testing

q-n-a
overflow
tcs
algorithms
rand-approx
random
motivation
list
examples
aaronson
tcstariat
gowers
spatial
geometry
polynomials
teaching
nibble
- polynomial identity testing

january 2017 by nhaliday

Copy this bookmark: