recentpopularlog in

nhaliday : dbs   52

Zettlr | "Wtf is a Zettelkasten?"
The Zettelkasten Manifesto
In case you're still wondering what a Zettelkasten is and you need a little bit more incentives to get started, please have a look at a video we've made earlier this week, where we outline why the notion of a Zettelkasten has become so intrinsically linked to the name of Niklas Luhmann, why we think that this is bad and how we think we should think of Zettelkästen:
techtariat  org:com  project  software  tools  exocortex  notetaking  workflow  thinking  dbs  structure  network-structure  critique  graphs  stay-organized  germanic  metabuch 
11 weeks ago by nhaliday
Is the bounty system effective? - Meta Stack Exchange
https://math.meta.stackexchange.com/questions/20155/how-effective-are-bounties
could do some kinda econometric analysis using the data explorer to determine this once and for all: https://pinboard.in/u:nhaliday/b:c0cd449b9e69
maybe some kinda RDD in time, or difference-in-differences?
I don't think answer quality/quantity by time meets the common trend assumption for DD, tho... Questions that eventually receive bounty are prob higher quality in the first place, and higher quality answers accumulate more and better answers regardless. Hmm.
q-n-a  stackex  forum  community  info-foraging  efficiency  cost-benefit  data  analysis  incentives  attention  quality  ubiquity  supply-demand  multi  math  causation  endogenous-exogenous  intervention  branches  control  tactics  sleuthin  hmm  idk  todo  data-science  overflow  dbs  regression  shift  methodology  econometrics 
november 2019 by nhaliday
"Performance Matters" by Emery Berger - YouTube
Stabilizer is a tool that enables statistically sound performance evaluation, making it possible to understand the impact of optimizations and conclude things like the fact that the -O2 and -O3 optimization levels are indistinguishable from noise (sadly true).

Since compiler optimizations have run out of steam, we need better profiling support, especially for modern concurrent, multi-threaded applications. Coz is a new "causal profiler" that lets programmers optimize for throughput or latency, and which pinpoints and accurately predicts the impact of optimizations.

- randomize extraneous factors like code layout and stack size to avoid spurious speedups
- simulate speedup of component of concurrent system (to assess effect of optimization before attempting) by slowing down the complement (all but that component)
- latency vs. throughput, Little's law
video  presentation  programming  engineering  nitty-gritty  performance  devtools  compilers  latency-throughput  concurrency  legacy  causation  wire-guided  let-me-see  manifolds  pro-rata  tricks  endogenous-exogenous  control  random  signal-noise  comparison  marginal  llvm  systems  hashing  computer-memory  build-packaging  composition-decomposition  coupling-cohesion  local-global  dbs  direct-indirect  symmetry  research  models  metal-to-virtual  linux  measurement  simulation  magnitude  realness  hypothesis-testing  techtariat 
october 2019 by nhaliday
The Law of Leaky Abstractions – Joel on Software
[TCP/IP example]

All non-trivial abstractions, to some degree, are leaky.

...

- Something as simple as iterating over a large two-dimensional array can have radically different performance if you do it horizontally rather than vertically, depending on the “grain of the wood” — one direction may result in vastly more page faults than the other direction, and page faults are slow. Even assembly programmers are supposed to be allowed to pretend that they have a big flat address space, but virtual memory means it’s really just an abstraction, which leaks when there’s a page fault and certain memory fetches take way more nanoseconds than other memory fetches.

- The SQL language is meant to abstract away the procedural steps that are needed to query a database, instead allowing you to define merely what you want and let the database figure out the procedural steps to query it. But in some cases, certain SQL queries are thousands of times slower than other logically equivalent queries. A famous example of this is that some SQL servers are dramatically faster if you specify “where a=b and b=c and a=c” than if you only specify “where a=b and b=c” even though the result set is the same. You’re not supposed to have to care about the procedure, only the specification. But sometimes the abstraction leaks and causes horrible performance and you have to break out the query plan analyzer and study what it did wrong, and figure out how to make your query run faster.

...

- C++ string classes are supposed to let you pretend that strings are first-class data. They try to abstract away the fact that strings are hard and let you act as if they were as easy as integers. Almost all C++ string classes overload the + operator so you can write s + “bar” to concatenate. But you know what? No matter how hard they try, there is no C++ string class on Earth that will let you type “foo” + “bar”, because string literals in C++ are always char*’s, never strings. The abstraction has sprung a leak that the language doesn’t let you plug. (Amusingly, the history of the evolution of C++ over time can be described as a history of trying to plug the leaks in the string abstraction. Why they couldn’t just add a native string class to the language itself eludes me at the moment.)

- And you can’t drive as fast when it’s raining, even though your car has windshield wipers and headlights and a roof and a heater, all of which protect you from caring about the fact that it’s raining (they abstract away the weather), but lo, you have to worry about hydroplaning (or aquaplaning in England) and sometimes the rain is so strong you can’t see very far ahead so you go slower in the rain, because the weather can never be completely abstracted away, because of the law of leaky abstractions.

One reason the law of leaky abstractions is problematic is that it means that abstractions do not really simplify our lives as much as they were meant to. When I’m training someone to be a C++ programmer, it would be nice if I never had to teach them about char*’s and pointer arithmetic. It would be nice if I could go straight to STL strings. But one day they’ll write the code “foo” + “bar”, and truly bizarre things will happen, and then I’ll have to stop and teach them all about char*’s anyway.

...

The law of leaky abstractions means that whenever somebody comes up with a wizzy new code-generation tool that is supposed to make us all ever-so-efficient, you hear a lot of people saying “learn how to do it manually first, then use the wizzy tool to save time.” Code generation tools which pretend to abstract out something, like all abstractions, leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don’t save us time learning.

https://www.benkuhn.net/hatch
People think a lot about abstractions and how to design them well. Here’s one feature I’ve recently been noticing about well-designed abstractions: they should have simple, flexible and well-integrated escape hatches.
techtariat  org:com  working-stiff  essay  programming  cs  software  abstraction  worrydream  thinking  intricacy  degrees-of-freedom  networking  examples  traces  no-go  volo-avolo  tradeoffs  c(pp)  pls  strings  dbs  transportation  driving  analogy  aphorism  learning  paradox  systems  elegance  nitty-gritty  concrete  cracker-prog  metal-to-virtual  protocol-metadata  design  system-design  multi  ratty  core-rats  integration-extension  composition-decomposition  flexibility  parsimony  interface-compatibility 
july 2019 by nhaliday
Fossil: Home
VCS w/ builtin issue tracking and wiki used by SQLite
tools  devtools  software  vcs  wiki  debugging  integration-extension  oss  dbs 
may 2019 by nhaliday
Is backing up a MySQL database in Git a good idea? - Software Engineering Stack Exchange
*no: list of alternatives*

https://stackoverflow.com/questions/115369/do-you-use-source-control-for-your-database-items
Top 2 answers contradict each other but both agree that you should at least version the schema and other scripts.

My impression is that the guy linked in the accepted answer is arguing for a minority practice.
q-n-a  stackex  programming  engineering  dbs  vcs  git  debate  critique  backup  best-practices  flux-stasis  nitty-gritty  gotchas  init  advice  code-organizing  multi  hmm  idk  contrarianism  rhetoric  links  system-design 
may 2019 by nhaliday
its-not-software - steveyegge2
You don't work in the software industry.

...

So what's the software industry, and how do we differ from it?

Well, the software industry is what you learn about in school, and it's what you probably did at your previous company. The software industry produces software that runs on customers' machines — that is, software intended to run on a machine over which you have no control.

So it includes pretty much everything that Microsoft does: Windows and every application you download for it, including your browser.

It also includes everything that runs in the browser, including Flash applications, Java applets, and plug-ins like Adobe's Acrobat Reader. Their deployment model is a little different from the "classic" deployment models, but it's still software that you package up and release to some unknown client box.

...

Servware

Our industry is so different from the software industry, and it's so important to draw a clear distinction, that it needs a new name. I'll call it Servware for now, lacking anything better. Hardware, firmware, software, servware. It fits well enough.

Servware is stuff that lives on your own servers. I call it "stuff" advisedly, since it's more than just software; it includes configuration, monitoring systems, data, documentation, and everything else you've got there, all acting in concert to produce some observable user experience on the other side of a network connection.
techtariat  sv  tech  rhetoric  essay  software  saas  devops  engineering  programming  contrarianism  list  top-n  best-practices  applicability-prereqs  desktop  flux-stasis  homo-hetero  trends  games  thinking  checklists  dbs  models  communication  tutorial  wiki  integration-extension  frameworks  api  whole-partial-many  metrics  retrofit  c(pp)  pls  code-dive  planning  working-stiff  composition-decomposition  libraries  conceptual-vocab  amazon  system-design  cracker-prog  tech-infrastructure  blowhards  client-server  project-management 
may 2019 by nhaliday
Recitation 25: Data locality and B-trees
The same idea can be applied to trees. Binary trees are not good for locality because a given node of the binary tree probably occupies only a fraction of a cache line. B-trees are a way to get better locality. As in the hash table trick above, we store several elements in a single node -- as many as will fit in a cache line.

B-trees were originally invented for storing data structures on disk, where locality is even more crucial than with memory. Accessing a disk location takes about 5ms = 5,000,000ns. Therefore if you are storing a tree on disk you want to make sure that a given disk read is as effective as possible. B-trees, with their high branching factor, ensure that few disk reads are needed to navigate to the place where data is stored. B-trees are also useful for in-memory data structures because these days main memory is almost as slow relative to the processor as disk drives were when B-trees were introduced!
nibble  org:junk  org:edu  cornell  lecture-notes  exposition  programming  engineering  systems  dbs  caching  performance  memory-management  os  computer-memory  metal-to-virtual  trees  data-structures  local-global 
september 2017 by nhaliday
Anatomy of an SQL Index: What is an SQL Index
“An index makes the query fast” is the most basic explanation of an index I have ever seen. Although it describes the most important aspect of an index very well, it is—unfortunately—not sufficient for this book. This chapter describes the index structure in a less superficial way but doesn't dive too deeply into details. It provides just enough insight for one to understand the SQL performance aspects discussed throughout the book.

B-trees, etc.
techtariat  tutorial  explanation  performance  programming  engineering  dbs  trees  data-structures  nibble  caching  metal-to-virtual  abstraction  applications  nitty-gritty  ground-up  orders  systems 
september 2017 by nhaliday
HN: the good parts
HN comments are terrible. On any topic I’m informed about, the vast majority of comments are pretty clearly wrong. Most of the time, there are zero comments from people who know anything about the topic and the top comment is reasonable sounding but totally incorrect. Additionally, many comments are gratuitously mean. You'll often hear mean comments backed up with something like "this is better than the other possibility, where everyone just pats each other on the back with comments like 'this is great'", as if being an asshole is some sort of talisman against empty platitudes. I've seen people push back against that; when pressed, people often say that it’s either impossible or inefficient to teach someone without being mean, as if telling someone that they're stupid somehow helps them learn. It's as if people learned how to explain things by watching Simon Cowell and can't comprehend the concept of an explanation that isn't littered with personal insults. Paul Graham has said, "Oh, you should never read Hacker News comments about anything you write”. Most of the negative things you hear about HN comments are true.

And yet, I haven’t found a public internet forum with better technical commentary. On topics I'm familiar with, while it's rare that a thread will have even a single comment that's well-informed, when those comments appear, they usually float to the top. On other forums, well-informed comments are either non-existent or get buried by reasonable sounding but totally wrong comments when they appear, and they appear even more rarely than on HN.

...

I compiled a very abbreviated list of comments I like because comments seem to get lost. If you write a blog post, people will refer it years later, but comments mostly disappear. I think that’s sad -- there’s a lot of great material on HN (and yes, even more not-so-great material).
hn  forum  subculture  list  contrarianism  community  dan-luu  top-n  🖥  techtariat  microsoft  software  desktop  protocol-metadata  media  truth  accuracy  tech  sv  investigative-journo  comparison  scale  google  startups  entrepreneurialism  cost-benefit  tradeoffs  learning  os  estimate  data  objektbuch  ranking  pro-rata  success  working-stiff  career  strategy  venture  stackex  amazon  soft-skills  dark-arts  management  organizing  incentives  dbs  state  productivity  speaking  embodied  impro  checklists  transitions  collaboration  unix  linux  critique  rant  cryptocurrency  bitcoin  blockchain 
october 2016 by nhaliday
Camlistore
renamed to https://perkeep.org/

very similar thing by Rob Pike: https://upspin.io
https://news.ycombinator.com/item?id=13700492
Hi, Camlistore author here.
Andrew Gerrand worked with me on Camlistore too and is one of the Upspin authors.

The main difference I see is that Camlistore can model POSIX filesystems for backup and FUSE, but that's not its preferred view of the world. It is perfectly happy modeling a tweet or a "like" on its own, without any name in the world.

Upspin's data model is very much a traditional filesystem.

Also, upspin cared about the interop between different users from day 1 with keyservers etc, whereas for Camlistore that was not the primary design criteria. (We're only starting to work on that now in Camlistore).

But there is some similarity for sure, and Andrew knows both.
tools  golang  cloud  yak-shaving  software  libraries  google  oss  exocortex  nostalgia  summer-2014  retention  database  dbs  multi  rsc  networking  web  distributed  hn  commentary 
october 2016 by nhaliday

Copy this bookmark:





to read