nhaliday : reference   1174

 « earlier
Japanese sound symbolism - Wikipedia
Japanese has a large inventory of sound symbolic or mimetic words, known in linguistics as ideophones.[1][2] Sound symbolic words are found in written as well as spoken Japanese.[3] Known popularly as onomatopoeia, these words are not just imitative of sounds but cover a much wider range of meanings;[1] indeed, many sound-symbolic words in Japanese are for things that don't make any noise originally, most clearly demonstrated by shiinto (しいんと), meaning "silently".
language  foreign-lang  trivia  wiki  reference  audio  hmm  alien-character  culture  list  objektbuch  japan  asia  writing
october 2019 by nhaliday
Ask HN: Favorite note-taking software? | Hacker News

my wishlist as of 2019:
- web + desktop macOS + mobile iOS (at least viewing on the last but ideally also editing)
- sync across all those
- open-source data format that's easy to manipulate for scripting purposes
- flexible organization: mostly tree hierarchical (subsuming linear/unorganized) but with the option for directed (acyclic) graph (possibly a second layer of structure/linking)
- can store plain text, LaTeX, diagrams, sketches, and raster/vector images (video prob not necessary except as links to elsewhere)
- full-text search
- somehow digest/import data from Pinboard, Workflowy, Papers 3/Bookends, Skim, and iBooks/e-readers (esp. Kobo), ideally absorbing most of their functionality
- so, eg, track notes/annotations side-by-side w/ original PDF/DjVu/ePub documents (to replace Papers3/Bookends/Skim), and maybe web pages too (to replace Pinboard)
- OCR of handwritten notes (how to handle equations/diagrams?)
- various forms of NLP analysis of everything (topic models, clustering, etc)
- maybe version control (less important than export)

candidates?:
- Evernote prob ruled out do to heavy use of proprietary data formats (unless I can find some way to export with tolerably clean output)
- Workflowy/Dynalist are good but only cover a subset of functionality I want
- org-mode doesn't interact w/ mobile well (and I haven't evaluated it in detail otherwise)
- TiddlyWiki/Zim are in the running, but not sure about mobile
- idk about vimwiki but I'm not that wedded to vim and it seems less widely used than org-mode/TiddlyWiki/Zim so prob pass on that
- Quiver/Joplin/Inkdrop look similar and cover a lot of bases, TODO: evaluate more
- Trilium looks especially promising, tho read-only mobile and for macOS desktop look at this: https://github.com/zadam/trilium/issues/511
- RocketBook is interesting scanning/OCR solution but prob not sufficient due to proprietary data format
- TODO: many more candidates, eg, TreeSheets, Gingko, OneNote (macOS?...), Notion (proprietary data format...), Zotero, Nodebook (https://nodebook.io/landing), Polar (https://getpolarized.io), Roam (looks very promising)

Ask HN: What do you use for you personal note taking activity?: https://news.ycombinator.com/item?id=15736102

Ask HN: How do you take notes (useful note-taking strategies)?: https://news.ycombinator.com/item?id=13064215

Ask HN: How to get better at taking notes?: https://news.ycombinator.com/item?id=21419478

Ask HN: How did you build up your personal knowledge base?: https://news.ycombinator.com/item?id=21332957
nice comment from math guy on structure and difference between math and CS: https://news.ycombinator.com/item?id=21338628
useful comment collating related discussions: https://news.ycombinator.com/item?id=21333383
highlights:
Designing a Personal Knowledge base: https://news.ycombinator.com/item?id=8270759
Ask HN: How to organize personal knowledge?: https://news.ycombinator.com/item?id=17892731
Do you use a personal 'knowledge base'?: https://news.ycombinator.com/item?id=21108527
Ask HN: How do you share/organize knowledge at work and life?: https://news.ycombinator.com/item?id=21310030
Managing my personal knowledge base: https://news.ycombinator.com/item?id=22000791
Building personal search infrastructure for your knowledge and code: https://beepb00p.xyz/pkm-search.html

How to annotate literally everything: https://beepb00p.xyz/annotating.html
Ask HN: How do you organize document digests / personal knowledge?: https://news.ycombinator.com/item?id=21642289
Ask HN: Good solution for storing notes/excerpts from books?: https://news.ycombinator.com/item?id=21920143
some related stuff in the reddit links at the bottom of this pin

https://beepb00p.xyz/grasp.html
How to capture information from your browser and stay sane

Ask HN: Best solutions for keeping a personal log?: https://news.ycombinator.com/item?id=21906650

other stuff:
plain text: https://news.ycombinator.com/item?id=21685660

https://www.getdnote.com/blog/how-i-built-personal-knowledge-base-for-myself/
Tiago Forte: https://www.buildingasecondbrain.com

hn search: https://hn.algolia.com/?query=notetaking&type=story

Slant comparison commentary: https://news.ycombinator.com/item?id=7011281

good comparison of options here in comments here (and Trilium itself looks good): https://news.ycombinator.com/item?id=18840990

https://en.wikipedia.org/wiki/Comparison_of_note-taking_software

stuff from Andy Matuschak and Michael Nielsen on general note-taking:
https://archive.is/1i9ep
Software interfaces undervalue peripheral vision! (a thread)
https://archive.is/J06UB
This morning I implemented PageRank to sort backlinks in my prototype note system. Mixed results!
https://archive.is/BOiCG
https://archive.is/4zB37
One way to dream up post-book media to make reading more effective and meaningful is to systematize "expert" practices (e.g. How to Read a Book), so more people can do them, more reliably and more cheaply. But… the most erudite people I know don't actually do those things!

the memex essay and comments from various people including Andy on it: https://pinboard.in/u:nhaliday/b:1cddf69c0b31

some more stuff specific to Roam below, and cf "Why books don't work": https://pinboard.in/u:nhaliday/b:b4d4461f6378

wikis:
https://www.slant.co/versus/5116/8768/~tiddlywiki_vs_zim
https://www.wikimatrix.org/compare/tiddlywiki+zim
http://tiddlymap.org/

apps:
Roam: https://news.ycombinator.com/item?id=21440289
https://www.reddit.com/r/RoamResearch/
https://archive.is/TJPQN
https://archive.is/CrNwZ
https://www.nateliason.com/blog/roam
https://archive.is/To30Q
https://archive.is/UrI1x
https://archive.is/Ww22V
Knowledge systems which display contextual backlinks to a node open up an interesting new behavior. You can bootstrap a new node extensionally (rather than intensionally) by simply linking to it from many other nodes—even before it has any content.
Curious: what are the most striking public @RoamResearch pages that you know? I'd like to see examples of people using it for interesting purposes, or in interesting ways.
https://acesounderglass.com/2019/10/24/epistemic-spot-check-the-fate-of-rome-round-2/
https://archive.is/xvaMh
If I weren't doing my own research on questions in knowledge systems (which necessitates tinkering with my own), and if I weren't allergic to doing serious work in webapps, I'd likely use Roam instead!
https://talk.dynalist.io/t/roam-research-new-web-based-outliner-that-supports-transclusion-wiki-features-thoughts/5911/16
http://forum.eastgate.com/t/roam-research-interesting-approach-to-note-taking/2713/10
interesting app: http://www.eastgate.com/Tinderbox/
https://www.theatlantic.com/notes/2016/09/labor-day-software-update-tinderbox-scrivener/498443/

intriguing but probably not appropriate for my needs: https://www.sophya.ai/

Inkdrop: https://news.ycombinator.com/item?id=20103589

Joplin: https://news.ycombinator.com/item?id=15815040
https://news.ycombinator.com/item?id=21555238

MindForgr: https://news.ycombinator.com/item?id=22088175

https://wreeto.com/

Leo Editor (combines tree outlining w/ literate programming/scripting, I think?): https://news.ycombinator.com/item?id=17769892

Frame: https://news.ycombinator.com/item?id=18760079

https://archive.is/xViTY
Notion: https://news.ycombinator.com/item?id=18904648
https://coda.io/welcome
https://news.ycombinator.com/item?id=15543181

accounting: https://news.ycombinator.com/item?id=19833881
Coda mentioned

https://archive.is/e9oHu
https://archive.is/HUH8V
https://archive.is/VL2mi

Anki:
https://www.freecodecamp.org/news/how-anki-saved-my-engineering-career-293a90f70a73/
https://archive.is/OaGc5
maybe not the best source for a review/advice

https://lightsheets.app/

tablet:
https://www.inkandswitch.com/muse-studio-for-ideas.html
https://www.inkandswitch.com/capstone-manuscript.html
https://news.ycombinator.com/item?id=20255457
hn  discussion  recommendations  software  tools  desktop  app  notetaking  exocortex  wkfly  wiki  productivity  multi  comparison  crosstab  properties  applicability-prereqs  nlp  info-foraging  chart  webapp  reference  q-n-a  retention  workflow  reddit  social  ratty  ssc  learning  studying  commentary  structure  thinking  network-structure  things  collaboration  ocr  trees  graphs  LaTeX  search  todo  project  money-for-time  synchrony  pinboard  state  duplication  worrydream  simplification-normalization  links  minimalism  design  neurons  ai-control  openai  miri-cfar  parsimony  intricacy  meta:reading  examples  prepping  new-religion  deep-materialism  techtariat  review  critique  mobile  integration-extension  interface-compatibility  api  twitter  backup  vgr  postrat  personal-finance  pragmatic  stay-organized  project-management  news  org:mag  epistemic  steel-man  explore-exploit  correlation  cost-benefit  convexity-curvature  michael-nielsen  hci  ux  oly  skunkworks  europe  germanic
october 2019 by nhaliday
Linus's Law - Wikipedia
Linus's Law is a claim about software development, named in honor of Linus Torvalds and formulated by Eric S. Raymond in his essay and book The Cathedral and the Bazaar (1999).[1][2] The law states that "given enough eyeballs, all bugs are shallow";

--

In Facts and Fallacies about Software Engineering, Robert Glass refers to the law as a "mantra" of the open source movement, but calls it a fallacy due to the lack of supporting evidence and because research has indicated that the rate at which additional bugs are uncovered does not scale linearly with the number of reviewers; rather, there is a small maximum number of useful reviewers, between two and four, and additional reviewers above this number uncover bugs at a much lower rate.[4] While closed-source practitioners also promote stringent, independent code analysis during a software project's development, they focus on in-depth review by a few and not primarily the number of "eyeballs".[5][6]

Although detection of even deliberately inserted flaws[7][8] can be attributed to Raymond's claim, the persistence of the Heartbleed security bug in a critical piece of code for two years has been considered as a refutation of Raymond's dictum.[9][10][11][12] Larry Seltzer suspects that the availability of source code may cause some developers and researchers to perform less extensive tests than they would with closed source software, making it easier for bugs to remain.[12] In 2015, the Linux Foundation's executive director Jim Zemlin argued that the complexity of modern software has increased to such levels that specific resource allocation is desirable to improve its security. Regarding some of 2014's largest global open source software vulnerabilities, he says, "In these cases, the eyeballs weren't really looking".[11] Large scale experiments or peer-reviewed surveys to test how well the mantra holds in practice have not been performed.

Given enough eyeballs, all bugs are shallow? Revisiting Eric Raymond with bug bounty programs: https://academic.oup.com/cybersecurity/article/3/2/81/4524054

https://hbfs.wordpress.com/2009/03/31/how-many-eyeballs-to-make-a-bug-shallow/
wiki  reference  aphorism  ideas  stylized-facts  programming  engineering  linux  worse-is-better/the-right-thing  correctness  debugging  checking  best-practices  security  error  scale  ubiquity  collaboration  oss  realness  empirical  evidence-based  multi  study  info-econ  economics  intricacy  plots  manifolds  techtariat  cracker-prog  os  systems  magnitude  quantitative-qualitative  number  threat-modeling
october 2019 by nhaliday
Shuffling - Wikipedia
The Gilbert–Shannon–Reeds model provides a mathematical model of the random outcomes of riffling, that has been shown experimentally to be a good fit to human shuffling[2] and that forms the basis for a recommendation that card decks be riffled seven times in order to randomize them thoroughly.[3] Later, mathematicians Lloyd M. Trefethen and Lloyd N. Trefethen authored a paper using a tweaked version of the Gilbert-Shannon-Reeds model showing that the minimum number of riffles for total randomization could also be 5, if the method of defining randomness is changed.[4][5]
nibble  tidbits  trivia  cocktail  wiki  reference  games  howto  random  models  math  applications  probability  math.CO  mixing  markov  sampling  best-practices  acm
august 2019 by nhaliday
Call graph - Wikipedia
I've found both static and dynamic versions useful (former mostly when I don't want to go thru pain of compiling something)

best options AFAICT:

C/C++ and maybe Go: https://github.com/gperftools/gperftools
https://gperftools.github.io/gperftools/cpuprofile.html

static: https://github.com/Vermeille/clang-callgraph
https://stackoverflow.com/questions/5373714/how-to-generate-a-calling-graph-for-c-code
I had to go through some extra pain to get this to work:
- if you use Homebrew LLVM (that's slightly incompatible w/ macOS c++filt, make sure to pass -n flag)
- similarly macOS sed needs two extra backslashes for each escape of the angle brackets

another option: doxygen

Go: https://stackoverflow.com/questions/31362332/creating-call-graph-in-golang
both static and dynamic in one tool

Java: https://github.com/gousiosg/java-callgraph
both static and dynamic in one tool

Python:
https://github.com/gak/pycallgraph
more up-to-date forks: https://github.com/daneads/pycallgraph2 and https://github.com/YannLuo/pycallgraph
I've had some trouble getting nice output from this (even just getting the right set of nodes displayed, not even taking into account layout and formatting).
- Argument parsing syntax is idiosyncratic. Just read pycallgraph --help.
- Options -i and -e take glob patterns (see pycallgraph2/{tracer,globbing_filter}.py), which are applied the function names qualified w/ module paths.
- Functions defined in the script you are running receive no module path. There is no easy way to filter for them using the -i and -e options.
- The --debug option gives you the graphviz for your own use instead of just writing the final image produced.

static: https://github.com/davidfraser/pyan
more up-to-date fork: https://github.com/itsayellow/pyan/
one way to good results: pyan -dea --format yed $MODULE_FILES > output.graphml, then open up in yEd and use hierarchical layout various: https://github.com/jrfonseca/gprof2dot I believe all the dynamic tools listed here support weighting nodes and edges by CPU time/samples (inclusive and exclusive of descendants) and discrete calls. In the case of the gperftools and the Java option you probably have to parse the output to get the latter, tho. IIRC Dtrace has probes for function entry/exit. So that's an option as well. old pin: https://github.com/nst/objc_dep Graph the import dependancies in an Objective-C project concept wiki reference tools devtools graphs trees programming code-dive let-me-see big-picture libraries software recommendations list top-n links c(pp) golang python javascript jvm stackex q-n-a howto yak-shaving visualization dataviz performance structure oss osx unix linux static-dynamic repo cocoa july 2019 by nhaliday An Eye Tracking Study on camelCase and under_score Identifier Styles - IEEE Conference Publication One main difference is that subjects were trained mainly in the underscore style and were all programmers. While results indicate no difference in accuracy between the two styles, subjects recognize identifiers in the underscore style more quickly. ToCamelCaseorUnderscore: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.158.9499 An empirical study of 135 programmers and non-programmers was conducted to better understand the impact of identifier style on code readability. The experiment builds on past work of others who study how readers of natural language perform such tasks. Results indicate that camel casing leads to higher accuracy among all subjects regardless of training, and those trained in camel casing are able to recognize identifiers in the camel case style faster than identifiers in the underscore style. https://en.wikipedia.org/wiki/Camel_case#Readability_studies A 2009 study comparing snake case to camel case found that camel case identifiers could be recognised with higher accuracy among both programmers and non-programmers, and that programmers already trained in camel case were able to recognise those identifiers faster than underscored snake-case identifiers.[35] A 2010 follow-up study, under the same conditions but using an improved measurement method with use of eye-tracking equipment, indicates: "While results indicate no difference in accuracy between the two styles, subjects recognize identifiers in the underscore style more quickly."[36] study psychology cog-psych hci programming best-practices stylized-facts null-result multi wiki reference concept empirical evidence-based efficiency accuracy time code-organizing grokkability protocol-metadata form-design grokkability-clarity july 2019 by nhaliday Factorization of polynomials over finite fields - Wikipedia In mathematics and computer algebra the factorization of a polynomial consists of decomposing it into a product of irreducible factors. This decomposition is theoretically possible and is unique for polynomials with coefficients in any field, but rather strong restrictions on the field of the coefficients are needed to allow the computation of the factorization by means of an algorithm. In practice, algorithms have been designed only for polynomials with coefficients in a finite field, in the field of rationals or in a finitely generated field extension of one of them. All factorization algorithms, including the case of multivariate polynomials over the rational numbers, reduce the problem to this case; see polynomial factorization. It is also used for various applications of finite fields, such as coding theory (cyclic redundancy codes and BCH codes), cryptography (public key cryptography by the means of elliptic curves), and computational number theory. As the reduction of the factorization of multivariate polynomials to that of univariate polynomials does not have any specificity in the case of coefficients in a finite field, only polynomials with one variable are considered in this article. ... In the algorithms that follow, the complexities are expressed in terms of number of arithmetic operations in Fq, using classical algorithms for the arithmetic of polynomials. [ed.: Interesting choice...] ... Factoring algorithms Many algorithms for factoring polynomials over finite fields include the following three stages: Square-free factorization Distinct-degree factorization Equal-degree factorization An important exception is Berlekamp's algorithm, which combines stages 2 and 3. Berlekamp's algorithm Main article: Berlekamp's algorithm The Berlekamp's algorithm is historically important as being the first factorization algorithm, which works well in practice. However, it contains a loop on the elements of the ground field, which implies that it is practicable only over small finite fields. For a fixed ground field, its time complexity is polynomial, but, for general ground fields, the complexity is exponential in the size of the ground field. [ed.: This actually looks fairly implementable.] wiki reference concept algorithms calculation nibble numerics math algebra math.CA fields polynomials levers multiplicative math.NT july 2019 by nhaliday Mutation testing - Wikipedia Mutation testing involves modifying a program in small ways.[1] Each mutated version is called a mutant and tests detect and reject mutants by causing the behavior of the original version to differ from the mutant. This is called killing the mutant. Test suites are measured by the percentage of mutants that they kill. New tests can be designed to kill additional mutants. wiki reference concept mutation selection analogy programming checking formal-methods debugging random list libraries links functional haskell javascript jvm c(pp) python dotnet oop perturbation static-dynamic july 2019 by nhaliday From Java 8 to Java 11 - Quick Guide - Codete blog notable: jshell, Optional methods, var (type inference), {List,Set,Map}.copyOf, java$PROGRAM.java execution
programming  cheatsheet  reference  comparison  jvm  pls  oly-programming  gotchas  summary  flux-stasis  marginal  org:com
july 2019 by nhaliday
How to work with GIT/SVN — good practices - Jakub Kułak - Medium
best part of this is the links to other guides
Commit Often, Perfect Later, Publish Once: https://sethrobertson.github.io/GitBestPractices/

My Favourite Git Commit: https://news.ycombinator.com/item?id=21289827
I use the following convention to start the subject of commit(posted by someone in a similar HN thread):
...
org:med  techtariat  tutorial  faq  guide  howto  workflow  devtools  best-practices  vcs  git  engineering  programming  multi  reference  org:junk  writing  technical-writing  hn  commentary  jargon  list  objektbuch  examples  analysis
june 2019 by nhaliday
PythonSpeed/PerformanceTips - Python Wiki
some are obsolete, but I think, eg, the tip about using local vars over globals is still applicable
wiki  reference  cheatsheet  objektbuch  list  programming  python  performance  pls  local-global
june 2019 by nhaliday
C++ Core Guidelines
This document is a set of guidelines for using C++ well. The aim of this document is to help people to use modern C++ effectively. By “modern C++” we mean effective use of the ISO C++ standard (currently C++17, but almost all of our recommendations also apply to C++14 and C++11). In other words, what would you like your code to look like in 5 years’ time, given that you can start now? In 10 years’ time?

https://isocpp.github.io/CppCoreGuidelines/
“Within C++ is a smaller, simpler, safer language struggling to get out.” – Bjarne Stroustrup

...

The guidelines are focused on relatively higher-level issues, such as interfaces, resource management, memory management, and concurrency. Such rules affect application architecture and library design. Following the rules will lead to code that is statically type safe, has no resource leaks, and catches many more programming logic errors than is common in code today. And it will run fast - you can afford to do things right.

We are less concerned with low-level issues, such as naming conventions and indentation style. However, no topic that can help a programmer is out of bounds.

Our initial set of rules emphasize safety (of various forms) and simplicity. They may very well be too strict. We expect to have to introduce more exceptions to better accommodate real-world needs. We also need more rules.

...

The rules are designed to be supported by an analysis tool. Violations of rules will be flagged with references (or links) to the relevant rule. We do not expect you to memorize all the rules before trying to write code.

contrary:
https://aras-p.info/blog/2018/12/28/Modern-C-Lamentations/
This will be a long wall of text, and kinda random! My main points are:
1. C++ compile times are important,
2. Non-optimized build performance is important,
3. Cognitive load is important. I don’t expand much on this here, but if a programming language or a library makes me feel stupid, then I’m less likely to use it or like it. C++ does that a lot :)
programming  engineering  pls  best-practices  systems  c(pp)  guide  metabuch  objektbuch  reference  cheatsheet  elegance  frontier  libraries  intricacy  advanced  advice  recommendations  big-picture  novelty  lens  philosophy  state  error  types  concurrency  memory-management  performance  abstraction  plt  compilers  expert-experience  multi  checking  devtools  flux-stasis  safety  system-design  techtariat  time  measure  dotnet  comparison  examples  build-packaging  thinking  worse-is-better/the-right-thing  cost-benefit  tradeoffs  essay  commentary  oop  correctness  computer-memory  error-handling  resources-effects  latency-throughput
june 2019 by nhaliday
Regex cheatsheet
Many programs use regular expression to find & replace text. However, they tend to come with their own different flavor.

You can probably expect most modern software and programming languages to be using some variation of the Perl flavor, "PCRE"; however command-line tools (grep, less, ...) will often use the POSIX flavor (sometimes with an extended variant, e.g. egrep or sed -r). ViM also comes with its own syntax (a superset of what Vi accepts).

This cheatsheet lists the respective syntax of each flavor, and the software that uses it.

accidental complexity galore
techtariat  reference  cheatsheet  documentation  howto  yak-shaving  editors  strings  syntax  examples  crosstab  objektbuch  python  comparison  gotchas  tip-of-tongue  automata-languages  pls  trivia  properties  libraries  nitty-gritty  intricacy  degrees-of-freedom  DSL  programming
june 2019 by nhaliday
Lindy effect - Wikipedia
The Lindy effect is a theory that the future life expectancy of some non-perishable things like a technology or an idea is proportional to their current age, so that every additional period of survival implies a longer remaining life expectancy.[1] Where the Lindy effect applies, mortality rate decreases with time. In contrast, living creatures and mechanical things follow a bathtub curve where, after "childhood", the mortality rate increases with time. Because life expectancy is probabilistically derived, a thing may become extinct before its "expected" survival. In other words, one needs to gauge both the age and "health" of the thing to determine continued survival.
wiki  reference  concept  metabuch  ideas  street-fighting  planning  comparison  time  distribution  flux-stasis  history  measure  correlation  arrows  branches  pro-rata  manifolds  aging  stylized-facts  age-generation  robust  technology  thinking  cost-benefit  conceptual-vocab  methodology  threat-modeling  efficiency  neurons  tools  track-record  ubiquity
june 2019 by nhaliday
Solution concept - Wikipedia
In game theory, a solution concept is a formal rule for predicting how a game will be played. These predictions are called "solutions", and describe which strategies will be adopted by players and, therefore, the result of the game. The most commonly used solution concepts are equilibrium concepts, most famously Nash equilibrium.

Many solution concepts, for many games, will result in more than one solution. This puts any one of the solutions in doubt, so a game theorist may apply a refinement to narrow down the solutions. Each successive solution concept presented in the following improves on its predecessor by eliminating implausible equilibria in richer games.

nice diagram
concept  conceptual-vocab  list  wiki  reference  acm  game-theory  inference  equilibrium  extrema  reduction  sub-super
may 2019 by nhaliday
What every computer scientist should know about floating-point arithmetic
Floating-point arithmetic is considered as esoteric subject by many people. This is rather surprising, because floating-point is ubiquitous in computer systems: Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating point standard, and concludes with examples of how computer system builders can better support floating point.

Float Toy: http://evanw.github.io/float-toy/
https://news.ycombinator.com/item?id=22113485

https://stackoverflow.com/questions/2729637/does-epsilon-really-guarantees-anything-in-floating-point-computations
"you must use an epsilon when dealing with floats" is a knee-jerk reaction of programmers with a superficial understanding of floating-point computations, for comparisons in general (not only to zero).

This is usually unhelpful because it doesn't tell you how to minimize the propagation of rounding errors, it doesn't tell you how to avoid cancellation or absorption problems, and even when your problem is indeed related to the comparison of two floats, it doesn't tell you what value of epsilon is right for what you are doing.

...

Regarding the propagation of rounding errors, there exists specialized analyzers that can help you estimate it, because it is a tedious thing to do by hand.

https://www.di.ens.fr/~cousot/projects/DAEDALUS/synthetic_summary/CEA/Fluctuat/index.html

This was part of HW1 of CS24:
https://en.wikipedia.org/wiki/Kahan_summation_algorithm
In particular, simply summing n numbers in sequence has a worst-case error that grows proportional to n, and a root mean square error that grows as {\displaystyle {\sqrt {n}}} {\sqrt {n}} for random inputs (the roundoff errors form a random walk).[2] With compensated summation, the worst-case error bound is independent of n, so a large number of values can be summed with an error that only depends on the floating-point precision.[2]

cf:
https://en.wikipedia.org/wiki/Pairwise_summation
In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence.[1] Although there are other techniques such as Kahan summation that typically have even smaller round-off errors, pairwise summation is nearly as good (differing only by a logarithmic factor) while having much lower computational cost—it can be implemented so as to have nearly the same cost (and exactly the same number of arithmetic operations) as naive summation.

In particular, pairwise summation of a sequence of n numbers xn works by recursively breaking the sequence into two halves, summing each half, and adding the two sums: a divide and conquer algorithm. Its worst-case roundoff errors grow asymptotically as at most O(ε log n), where ε is the machine precision (assuming a fixed condition number, as discussed below).[1] In comparison, the naive technique of accumulating the sum in sequence (adding each xi one at a time for i = 1, ..., n) has roundoff errors that grow at worst as O(εn).[1] Kahan summation has a worst-case error of roughly O(ε), independent of n, but requires several times more arithmetic operations.[1] If the roundoff errors are random, and in particular have random signs, then they form a random walk and the error growth is reduced to an average of {\displaystyle O(\varepsilon {\sqrt {\log n}})} O(\varepsilon {\sqrt {\log n}}) for pairwise summation.[2]

A very similar recursive structure of summation is found in many fast Fourier transform (FFT) algorithms, and is responsible for the same slow roundoff accumulation of those FFTs.[2][3]

https://eng.libretexts.org/Bookshelves/Electrical_Engineering/Book%3A_Fast_Fourier_Transforms_(Burrus)/10%3A_Implementing_FFTs_in_Practice/10.8%3A_Numerical_Accuracy_in_FFTs
However, these encouraging error-growth rates only apply if the trigonometric “twiddle” factors in the FFT algorithm are computed very accurately. Many FFT implementations, including FFTW and common manufacturer-optimized libraries, therefore use precomputed tables of twiddle factors calculated by means of standard library functions (which compute trigonometric constants to roughly machine precision). The other common method to compute twiddle factors is to use a trigonometric recurrence formula—this saves memory (and cache), but almost all recurrences have errors that grow as O(n‾√) , O(n) or even O(n2) which lead to corresponding errors in the FFT.

...

There are, in fact, trigonometric recurrences with the same logarithmic error growth as the FFT, but these seem more difficult to implement efficiently; they require that a table of Θ(logn) values be stored and updated as the recurrence progresses. Instead, in order to gain at least some of the benefits of a trigonometric recurrence (reduced memory pressure at the expense of more arithmetic), FFTW includes several ways to compute a much smaller twiddle table, from which the desired entries can be computed accurately on the fly using a bounded number (usually <3) of complex multiplications. For example, instead of a twiddle table with n entries ωkn , FFTW can use two tables with Θ(n‾√) entries each, so that ωkn is computed by multiplying an entry in one table (indexed with the low-order bits of k ) by an entry in the other table (indexed with the high-order bits of k ).

[ed.: Nicholas Higham's "Accuracy and Stability of Numerical Algorithms" seems like a good reference for this kind of analysis.]
nibble  pdf  papers  programming  systems  numerics  nitty-gritty  intricacy  approximation  accuracy  types  sci-comp  multi  q-n-a  stackex  hmm  oly-programming  accretion  formal-methods  yak-shaving  wiki  reference  algorithms  yoga  ground-up  divide-and-conquer  fourier  books  tidbits  chart  caltech  nostalgia  dynamic  calculator  visualization  protocol-metadata  identity
may 2019 by nhaliday
Delta debugging - Wikipedia
good overview of with examples: https://www.csm.ornl.gov/~sheldon/bucket/Automated-Debugging.pdf

Not as useful for my usecases (mostly contest programming) as QuickCheck. Input is generally pretty structured and I don't have a long history of code in VCS. And when I do have the latter git-bisect is probably enough.

good book tho: http://www.whyprogramsfail.com/toc.php
WHY PROGRAMS FAIL: A Guide to Systematic Debugging\
wiki  reference  programming  systems  debugging  c(pp)  python  tools  devtools  links  hmm  formal-methods  divide-and-conquer  vcs  git  search  yak-shaving  pdf  white-paper  multi  examples  stories  books  unit  caltech  recommendations  advanced  correctness
may 2019 by nhaliday
List of languages by total number of speakers - Wikipedia
- has both L1 (native speakers) and L2 (second-language speakers)
- I'm guessing most of Mandarin's L2 speakers are Chinese natives. Lots of dialects and such (Cantonese) within the country.
wiki  reference  data  list  top-n  ranking  population  scale  language  linguistics  anglo  china  asia  foreign-lang  objektbuch  india  MENA  europe  gallic  demographics  cost-benefit
march 2019 by nhaliday
per page:    204080120160

Copy this bookmark: