recentpopularlog in

nhaliday : stackex   341

« earlier  
antivirus - How to scan a PDF for malware? - Information Security Stack Exchange
Didier Stevens is the main focus when looking at PDF based malware.
- linked tools are more generally useful beyond malware (eg, get statistics on internal composition of PDFs)
- pdfid.py at least is kinda slow (6 minutes for a 100MB file)
q-n-a  stackex  security  tools  software  recommendations  terminal  pdf  yak-shaving 
november 2019 by nhaliday
Is the bounty system effective? - Meta Stack Exchange
https://math.meta.stackexchange.com/questions/20155/how-effective-are-bounties
could do some kinda econometric analysis using the data explorer to determine this once and for all: https://pinboard.in/u:nhaliday/b:c0cd449b9e69
maybe some kinda RDD in time, or difference-in-differences?
I don't think answer quality/quantity by time meets the common trend assumption for DD, tho... Questions that eventually receive bounty are prob higher quality in the first place, and higher quality answers accumulate more and better answers regardless. Hmm.
q-n-a  stackex  forum  community  info-foraging  efficiency  cost-benefit  data  analysis  incentives  attention  quality  ubiquity  supply-demand  multi  math  causation  endogenous-exogenous  intervention  branches  control  tactics  sleuthin  hmm  idk  todo  data-science  overflow  dbs  regression  shift  methodology  econometrics 
november 2019 by nhaliday
REST is the new SOAP | Hacker News
hn  commentary  techtariat  org:ngo  programming  engineering  web  client-server  networking  rant  rhetoric  contrarianism  idk  org:med  best-practices  working-stiff  api  models  protocol-metadata  internet  state  structure  chart  multi  q-n-a  discussion  expert-experience  track-record  reflection  cost-benefit  design  system-design  comparison  code-organizing  flux-stasis  interface-compatibility  trends  gotchas  stackex  state-of-art  distributed  concurrency  abstraction  concept  conceptual-vocab  python  ubiquity  list  top-n  duplication  synchrony  performance  caching 
november 2019 by nhaliday
The Definitive Guide To Website Authentication | Hacker News
hn  commentary  q-n-a  stackex  programming  identification-equivalence  security  web  client-server  crypto  checklists  best-practices  objektbuch  api  multi  cheatsheet  chart  system-design  nitty-gritty  yak-shaving  comparison  explanation  summary  jargon  state  networking  protocol-metadata  time 
november 2019 by nhaliday
javascript - ReactJS - Does render get called any time "setState" is called? - Stack Overflow
By default - yes.

There is a method boolean shouldComponentUpdate(object nextProps, object nextState), each component has this method and it's responsible to determine "should component update (run render function)?" every time you change state or pass new props from parent component.

You can write your own implementation of shouldComponentUpdate method for your component, but default implementation always returns true - meaning always re-run render function.

...

Next part of your question:

If so, why? I thought the idea was that React only rendered as little as needed - when state changed.

There are two steps of what we may call "render":

Virtual DOM render: when render method is called it returns a new virtual dom structure of the component. As I mentioned before, this render method is called always when you call setState(), because shouldComponentUpdate always returns true by default. So, by default, there is no optimization here in React.

Native DOM render: React changes real DOM nodes in your browser only if they were changed in the Virtual DOM and as little as needed - this is that great React's feature which optimizes real DOM mutation and makes React fast.
q-n-a  stackex  programming  intricacy  nitty-gritty  abstraction  state  frontend  web  javascript  libraries  facebook  frameworks  explanation  summary  models 
october 2019 by nhaliday
javascript - What is the purpose of double curly braces in React's JSX syntax? - Stack Overflow
The exterior set of curly braces are letting JSX know you want a JS expression. The interior set of curly braces represent a JavaScript object, meaning you’re passing in an object to the style attribute.
q-n-a  stackex  programming  explanation  trivia  gotchas  syntax  javascript  frontend  DSL  intricacy  facebook  libraries  frameworks 
october 2019 by nhaliday
How is definiteness expressed in languages with no definite article, clitic or affix? - Linguistics Stack Exchange
All languages, as far as we know, do something to mark information status. Basically this means that when you refer to an X, you have to do something to indicate the answer to questions like:
1. Do you have a specific X in mind?
2. If so, you think your hearer is familiar with the X you're talking about?
3. If so, have you already been discussing that X for a while, or is it new to the conversation?
4. If you've been discussing the X for a while, has it been the main topic of conversation?

Question #2 is more or less what we mean by "definiteness."
...

But there are lots of other information-status-marking strategies that don't directly involve definiteness marking. For example:
...
q-n-a  stackex  language  foreign-lang  linguistics  lexical  syntax  concept  conceptual-vocab  thinking  things  span-cover  direction  degrees-of-freedom  communication  anglo  japan  china  asia  russia  mediterranean  grokkability-clarity  intricacy  uniqueness  number  universalism-particularism  whole-partial-many  usa  latin-america  farmers-and-foragers  nordic  novelty  trivia  duplication  dependence-independence  spanish  context  orders  water  comparison 
october 2019 by nhaliday
What does it mean when a CSS rule is grayed out in Chrome's element inspector? - Stack Overflow
It seems that a strike-through indicates that a rule was overridden, but what does it mean when a style is grayed out?

--

Greyed/dimmed out text, can mean either

1. it's a default rule/property the browser applies, which includes defaulted short-hand properties.
2. It involves inheritance which is a bit more complicated.

...

In the case where a rule is applied to the currently selected element due to inheritance (i.e. the rule was applied to an ancestor, and the selected element inherited it), chrome will again display the entire ruleset.

The rules which are applied to the currently selected element appear in normal text.

If a rule exists in that set but is not applied because it's a non-inheritable property (e.g. background color), it will appear as greyed/dimmed text.

https://stackoverflow.com/questions/34712218/what-does-it-mean-when-chrome-dev-tools-shows-a-computed-property-greyed-out
Please note, not the Styles panel (I know what greyed-out means in that context—not applied), but the next panel over, the Computed properties panel.
--
The gray calculated properties are neither default, nor inherited. This only occurs on properties that were not defined for the element, but _calculated_ from either its children or parent _based on runtime layout rendering_.

Take this simple page as an example, display is default and font-size is inherited:
q-n-a  stackex  programming  frontend  web  DSL  form-design  devtools  explanation  trivia  tip-of-tongue  direct-indirect  trees  spreading  multi  nitty-gritty  static-dynamic  constraint-satisfaction  ui  browser  properties 
october 2019 by nhaliday
Advantages and disadvantages of building a single page web application - Software Engineering Stack Exchange
Advantages
- All data has to be available via some sort of API - this is a big advantage for my use case as I want to have an API to my application anyway. Right now about 60-70% of my calls to get/update data are done through a REST API. Doing a single page application will allow me to better test my REST API since the application itself will use it. It also means that as the application grows, the API itself will grow since that is what the application uses; no need to maintain the API as an add-on to the application.
- More responsive application - since all data loaded after the initial page is kept to a minimum and transmitted in a compact format (like JSON), data requests should generally be faster, and the server will do slightly less processing.

Disadvantages
- Duplication of code - for example, model code. I am going to have to create models both on the server side (PHP in this case) and the client side in Javascript.
- Business logic in Javascript - I can't give any concrete examples on why this would be bad but it just doesn't feel right to me having business logic in Javascript that anyone can read.
- Javascript memory leaks - since the page never reloads, Javascript memory leaks can happen, and I would not even know where to begin to debug them.

--

Disadvantages I often see with Single Page Web Applications:
- Inability to link to a specific part of the site, there's often only 1 entry point.
- Disfunctional back and forward buttons.
- The use of tabs is limited or non-existant.
(especially mobile:)
- Take very long to load.
- Don't function at all.
- Can't reload a page, a sudden loss of network takes you back to the start of the site.

This answer is outdated, Most single page application frameworks have a way to deal with the issues above – Luis May 27 '14 at 1:41
@Luis while the technology is there, too often it isn't used. – Pieter B Jun 12 '14 at 6:53

https://softwareengineering.stackexchange.com/questions/201838/building-a-web-application-that-is-almost-completely-rendered-by-javascript-whi

https://softwareengineering.stackexchange.com/questions/143194/what-advantages-are-conferred-by-using-server-side-page-rendering
Server-side HTML rendering:
- Fastest browser rendering
- Page caching is possible as a quick-and-dirty performance boost
- For "standard" apps, many UI features are pre-built
- Sometimes considered more stable because components are usually subject to compile-time validation
- Leans on backend expertise
- Sometimes faster to develop*
*When UI requirements fit the framework well.

Client-side HTML rendering:
- Lower bandwidth usage
- Slower initial page render. May not even be noticeable in modern desktop browsers. If you need to support IE6-7, or many mobile browsers (mobile webkit is not bad) you may encounter bottlenecks.
- Building API-first means the client can just as easily be an proprietary app, thin client, another web service, etc.
- Leans on JS expertise
- Sometimes faster to develop**
**When the UI is largely custom, with more interesting interactions. Also, I find coding in the browser with interpreted code noticeably speedier than waiting for compiles and server restarts.

https://softwareengineering.stackexchange.com/questions/237537/progressive-enhancement-vs-single-page-apps

https://stackoverflow.com/questions/21862054/single-page-application-advantages-and-disadvantages
=== ADVANTAGES ===
1. SPA is extremely good for very responsive sites:
2. With SPA we don't need to use extra queries to the server to download pages.
3.May be any other advantages? Don't hear about any else..

=== DISADVANTAGES ===
1. Client must enable javascript.
2. Only one entry point to the site.
3. Security.

https://softwareengineering.stackexchange.com/questions/287819/should-you-write-your-back-end-as-an-api
focused on .NET

https://softwareengineering.stackexchange.com/questions/337467/is-it-normal-design-to-completely-decouple-backend-and-frontend-web-applications
A SPA comes with a few issues associated with it. Here are just a few that pop in my mind now:
- it's mostly JavaScript. One error in a section of your application might prevent other sections of the application to work because of that Javascript error.
- CORS.
- SEO.
- separate front-end application means separate projects, deployment pipelines, extra tooling, etc;
- security is harder to do when all the code is on the client;

- completely interact in the front-end with the user and only load data as needed from the server. So better responsiveness and user experience;
- depending on the application, some processing done on the client means you spare the server of those computations.
- have a better flexibility in evolving the back-end and front-end (you can do it separately);
- if your back-end is essentially an API, you can have other clients in front of it like native Android/iPhone applications;
- the separation might make is easier for front-end developers to do CSS/HTML without needing to have a server application running on their machine.

Create your own dysfunctional single-page app: https://news.ycombinator.com/item?id=18341993
I think are three broadly assumed user benefits of single-page apps:
1. Improved user experience.
2. Improved perceived performance.
3. It’s still the web.

5 mistakes to create a dysfunctional single-page app
Mistake 1: Under-estimate long-term development and maintenance costs
Mistake 2: Use the single-page app approach unilaterally
Mistake 3: Under-invest in front end capability
Mistake 4: Use naïve dev practices
Mistake 5: Surf the waves of framework hype

The disadvantages of single page applications: https://news.ycombinator.com/item?id=9879685
You probably don't need a single-page app: https://news.ycombinator.com/item?id=19184496
https://news.ycombinator.com/item?id=20384738
MPA advantages:
- Stateless requests
- The browser knows how to deal with a traditional architecture
- Fewer, more mature tools
- SEO for free

When to go for the single page app:
- Core functionality is real-time (e.g Slack)
- Rich UI interactions are core to the product (e.g Trello)
- Lots of state shared between screens (e.g. Spotify)

Hybrid solutions
...
Github uses this hybrid approach.
...

Ask HN: Is it ok to use traditional server-side rendering these days?: https://news.ycombinator.com/item?id=13212465

https://www.reddit.com/r/webdev/comments/cp9vb8/are_people_still_doing_ssr/
https://www.reddit.com/r/webdev/comments/93n60h/best_javascript_modern_approach_to_multi_page/
https://www.reddit.com/r/webdev/comments/aax4k5/do_you_develop_solely_using_spa_these_days/
The SEO issues with SPAs is a persistent concern you hear about a lot, yet nobody ever quantifies the issues. That is because search engines keep the operation of their crawler bots and indexing secret. I have read into it some, and it seems that problem used to exist, somewhat, but is more or less gone now. Bots can deal with SPAs fine.
--
I try to avoid building a SPA nowadays if possible. Not because of SEO (there are now server-side solutions to help with that), but because a SPA increases the complexity of the code base by a magnitude. State management with Redux... Async this and that... URL routing... And don't forget to manage page history.

How about just render pages with templates and be done?

If I need a highly dynamic UI for a particular feature, then I'd probably build an embeddable JS widget for it.
q-n-a  stackex  programming  engineering  tradeoffs  system-design  design  web  frontend  javascript  cost-benefit  analysis  security  state  performance  traces  measurement  intricacy  code-organizing  applicability-prereqs  multi  comparison  smoothness  shift  critique  techtariat  chart  ui  coupling-cohesion  interface-compatibility  hn  commentary  best-practices  discussion  trends  client-server  api  composition-decomposition  cycles  frameworks  ecosystem  degrees-of-freedom  dotnet  working-stiff  reddit  social  project-management 
october 2019 by nhaliday
When to use margin vs padding in CSS - Stack Overflow
TL;DR: By default I use margin everywhere, except when I have a border or background and want to increase the space inside that visible box.

To me, the biggest difference between padding and margin is that vertical margins auto-collapse, and padding doesn't.

https://stackoverflow.com/questions/5958699/difference-between-margin-and-padding
One key thing that is missing in the answers here:

Top/Bottom margins are collapsible.

So if you have a 20px margin at the bottom of an element and a 30px margin at the top of the next element, the margin between the two elements will be 30px rather than 50px. This does not apply to left/right margin or padding.
--
Note that there are very specific circumstances in which vertical margins collapse - not just any two vertical margins will do so. Which just makes it all the more confusing (unless you're very familiar with the box model).

[ed.: roughly, separation = padding(A) + padding(B) + max{margin(A), margin(B)}, border in between padding and margin]
q-n-a  stackex  comparison  explanation  summary  best-practices  form-design  DSL  web  frontend  stylized-facts  methodology  programming  multi 
october 2019 by nhaliday
C++ IDE for Linux? - Stack Overflow
A:
- Vim/Emacs + Unix/GNU tools,
- VSCode or Sublime
- CodeLite
- Netbeans
- QT Creator
q-n-a  stackex  programming  c(pp)  devtools  tools  ide  software  recommendations  unix  linux 
september 2019 by nhaliday
python - Why do some languages like C++ and Java have a built-in LinkedList datastructure? - Stack Overflow
I searched through Guido's Python History blog, because I was sure he'd written about this, but apparently that's not where he did so. So, this is based on a combination of reasoning (aka educated guessing) and memory (possibly faulty).

Let's start from the end: Without knowing why Guido didn't add linked lists in Python 0.x, do we at least know why the core devs haven't added them since then, even though they've added a bunch of other types from OrderedDict to set?

Yes, we do. The short version is: Nobody has asked for it, in over two decades. Almost of what's been added to builtins or the standard library over the years has been (a variation on) something that's proven to be useful and popular on PyPI or the ActiveState recipes. That's where OrderedDict and defaultdict came from, for example, and enum and dataclass (based on attrs). There are popular libraries for a few other container types—various permutations of sorted dict/set, OrderedSet, trees and tries, etc., and both SortedContainers and blist have been proposed, but rejected, for inclusion in the stdlib.

But there are no popular linked list libraries, and that's why they're never going to be added.

So, that brings the question back a step: Why are there no popular linked list libraries?
q-n-a  stackex  impetus  roots  programming  pls  python  tradeoffs  cost-benefit  design  data-structures 
august 2019 by nhaliday
selenium - What is the difference between cssSelector & Xpath and which is better with respect to performance for cross browser testing? - Stack Overflow
CSS selectors perform far better than Xpath and it is well documented in Selenium community. Here are some reasons,
- Xpath engines are different in each browser, hence make them inconsistent
- IE does not have a native xpath engine, therefore selenium injects its own xpath engine for compatibility of its API. Hence we lose the advantage of using native browser features that WebDriver inherently promotes.
- Xpath tend to become complex and hence make hard to read in my opinion
However there are some situations where, you need to use xpath, for example, searching for a parent element or searching element by its text (I wouldn't recommend the later).
--
I’m going to hold the unpopular on SO selenium tag opinion that XPath is preferable to CSS in the longer run.

This long post has two sections - first I'll put a back-of-the-napkin proof the performance difference between the two is 0.1-0.3 milliseconds (yes; that's 100 microseconds), and then I'll share my opinion why XPath is more powerful.

...

With the performance out of the picture, why do I think xpath is better? Simple – versatility, and power.

Xpath is a language developed for working with XML documents; as such, it allows for much more powerful constructs than css.
For example, navigation in every direction in the tree – find an element, then go to its grandparent and search for a child of it having certain properties.
It allows embedded boolean conditions – cond1 and not(cond2 or not(cond3 and cond4)); embedded selectors – "find a div having these children with these attributes, and then navigate according to it".
XPath allows searching based on a node's value (its text) – however frowned upon this practice is, it does come in handy especially in badly structured documents (no definite attributes to step on, like dynamic ids and classes - locate the element by its text content).

The stepping in css is definitely easier – one can start writing selectors in a matter of minutes; but after a couple of days of usage, the power and possibilities xpath has quickly overcomes css.
And purely subjective – a complex css is much harder to read than a complex xpath expression.
q-n-a  stackex  comparison  best-practices  programming  yak-shaving  python  javascript  web  frontend  performance  DSL  debate  grokkability  trees  grokkability-clarity 
august 2019 by nhaliday
syntax highlighting - List known filetypes - Vi and Vim Stack Exchange
Type :setfiletype (with a space afterwards), then press Ctrl-d.
q-n-a  stackex  editors  howto  list  pls  config 
august 2019 by nhaliday
multithreading - C++11 introduced a standardized memory model. What does it mean? And how is it going to affect C++ programming? - Stack Overflow
I like the analogy of abandonment of sequential consistency to special relativity tho I think (emphasis on *think*, not know...) GR might be actually be the more appropriate one
q-n-a  stackex  programming  pls  c(pp)  systems  metal-to-virtual  computer-memory  concurrency  intricacy  nitty-gritty  analogy  comparison  physics  relativity  advanced 
august 2019 by nhaliday
testing - Is there a reason that tests aren't written inline with the code that they test? - Software Engineering Stack Exchange
The only advantage I can think of for inline tests would be reducing the number of files to be written. With modern IDEs this really isn't that big a deal.

There are, however, a number of obvious drawbacks to inline testing:
- It violates separation of concerns. This may be debatable, but to me testing functionality is a different responsibility than implementing it.
- You'd either have to introduce new language features to distinguish between tests/implementation, or you'd risk blurring the line between the two.
- Larger source files are harder to work with: harder to read, harder to understand, you're more likely to have to deal with source control conflicts.
- I think it would make it harder to put your "tester" hat on, so to speak. If you're looking at the implementation details, you'll be more tempted to skip implementing certain tests.
q-n-a  stackex  programming  engineering  best-practices  debate  correctness  checking  code-organizing  composition-decomposition  coupling-cohesion  psychology  cog-psych  attention  thinking  neurons  contiguity-proximity  grokkability  grokkability-clarity 
august 2019 by nhaliday
Call graph - Wikipedia
I've found both static and dynamic versions useful (former mostly when I don't want to go thru pain of compiling something)

best options AFAICT:

C/C++ and maybe Go: https://github.com/gperftools/gperftools
https://gperftools.github.io/gperftools/cpuprofile.html

static: https://github.com/Vermeille/clang-callgraph
https://stackoverflow.com/questions/5373714/how-to-generate-a-calling-graph-for-c-code
I had to go through some extra pain to get this to work:
- if you use Homebrew LLVM (that's slightly incompatible w/ macOS c++filt, make sure to pass -n flag)
- similarly macOS sed needs two extra backslashes for each escape of the angle brackets

another option: doxygen

Go: https://stackoverflow.com/questions/31362332/creating-call-graph-in-golang
both static and dynamic in one tool

Java: https://github.com/gousiosg/java-callgraph
both static and dynamic in one tool

Python:
https://github.com/gak/pycallgraph
more up-to-date forks: https://github.com/daneads/pycallgraph2 and https://github.com/YannLuo/pycallgraph
old docs: https://pycallgraph.readthedocs.io/en/master/
I've had some trouble getting nice output from this (even just getting the right set of nodes displayed, not even taking into account layout and formatting).
- Argument parsing syntax is idiosyncratic. Just read `pycallgraph --help`.
- Options -i and -e take glob patterns (see pycallgraph2/{tracer,globbing_filter}.py), which are applied the function names qualified w/ module paths.
- Functions defined in the script you are running receive no module path. There is no easy way to filter for them using the -i and -e options.
- The --debug option gives you the graphviz for your own use instead of just writing the final image produced.

static: https://github.com/davidfraser/pyan
more up-to-date fork: https://github.com/itsayellow/pyan/
one way to good results: `pyan -dea --format yed $MODULE_FILES > output.graphml`, then open up in yEd and use hierarchical layout

various: https://github.com/jrfonseca/gprof2dot

I believe all the dynamic tools listed here support weighting nodes and edges by CPU time/samples (inclusive and exclusive of descendants) and discrete calls. In the case of the gperftools and the Java option you probably have to parse the output to get the latter, tho.

IIRC Dtrace has probes for function entry/exit. So that's an option as well.

old pin: https://github.com/nst/objc_dep
Graph the import dependancies in an Objective-C project
concept  wiki  reference  tools  devtools  graphs  trees  programming  code-dive  let-me-see  big-picture  libraries  software  recommendations  list  top-n  links  c(pp)  golang  python  javascript  jvm  stackex  q-n-a  howto  yak-shaving  visualization  dataviz  performance  structure  oss  osx  unix  linux  static-dynamic  repo  cocoa 
july 2019 by nhaliday
c++ - mmap() vs. reading blocks - Stack Overflow
The discussion of mmap/read reminds me of two other performance discussions:

Some Java programmers were shocked to discover that nonblocking I/O is often slower than blocking I/O, which made perfect sense if you know that nonblocking I/O requires making more syscalls.

Some other network programmers were shocked to learn that epoll is often slower than poll, which makes perfect sense if you know that managing epoll requires making more syscalls.

Conclusion: Use memory maps if you access data randomly, keep it around for a long time, or if you know you can share it with other processes (MAP_SHARED isn't very interesting if there is no actual sharing). Read files normally if you access data sequentially or discard it after reading. And if either method makes your program less complex, do that. For many real world cases there's no sure way to show one is faster without testing your actual application and NOT a benchmark.
q-n-a  stackex  programming  systems  performance  tradeoffs  objektbuch  stylized-facts  input-output  caching  computer-memory  sequential  applicability-prereqs  local-global 
july 2019 by nhaliday
Errors in Math Functions (The GNU C Library)
https://stackoverflow.com/questions/22259537/guaranteed-precision-of-sqrt-function-in-c-c
For C99, there are no specific requirements. But most implementations try to support Annex F: IEC 60559 floating-point arithmetic as good as possible. It says:

An implementation that defines __STDC_IEC_559__ shall conform to the specifications in this annex.

And:

The sqrt functions in <math.h> provide the IEC 60559 square root operation.

IEC 60559 (equivalent to IEEE 754) says about basic operations like sqrt:

Except for binary <-> decimal conversion, each of the operations shall be performed as if it first produced an intermediate result correct to infinite precision and with unbounded range, and then coerced this intermediate result to fit in the destination's format.

The final step consists of rounding according to several rounding modes but the result must always be the closest representable value in the target precision.

[ed.: The list of other such correctly rounded functions is included in the IEEE-754 standard (which I've put w/ the C1x and C++2x standard drafts) under section 9.2, and it mainly consists of stuff that can be expressed in terms of exponentials (exp, log, trig functions, powers) along w/ sqrt/hypot functions.

Fun fact: this question was asked by Yeputons who has a codeforces profile.]
https://stackoverflow.com/questions/20945815/math-precision-requirements-of-c-and-c-standard
oss  libraries  systems  c(pp)  numerics  documentation  objektbuch  list  linux  unix  multi  q-n-a  stackex  programming  nitty-gritty  sci-comp  accuracy  types  approximation  IEEE  protocol-metadata  gnu 
july 2019 by nhaliday
history - Why are UNIX/POSIX system call namings so illegible? - Unix & Linux Stack Exchange
It's due to the technical constraints of the time. The POSIX standard was created in the 1980s and referred to UNIX, which was born in the 1970. Several C compilers at that time were limited to identifiers that were 6 or 8 characters long, so that settled the standard for the length of variable and function names.

http://neverworkintheory.org/2017/11/26/abbreviated-full-names.html
We carried out a family of controlled experiments to investigate whether the use of abbreviated identifier names, with respect to full-word identifier names, affects fault fixing in C and Java source code. This family consists of an original (or baseline) controlled experiment and three replications. We involved 100 participants with different backgrounds and experiences in total. Overall results suggested that there is no difference in terms of effort, effectiveness, and efficiency to fix faults, when source code contains either only abbreviated or only full-word identifier names. We also conducted a qualitative study to understand the values, beliefs, and assumptions that inform and shape fault fixing when identifier names are either abbreviated or full-word. We involved in this qualitative study six professional developers with 1--3 years of work experience. A number of insights emerged from this qualitative study and can be considered a useful complement to the quantitative results from our family of experiments. One of the most interesting insights is that developers, when working on source code with abbreviated identifier names, adopt a more methodical approach to identify and fix faults by extending their focus point and only in a few cases do they expand abbreviated identifiers.
q-n-a  stackex  trivia  programming  os  systems  legacy  legibility  ux  libraries  unix  linux  hacker  cracker-prog  multi  evidence-based  empirical  expert-experience  engineering  study  best-practices  comparison  quality  debugging  efficiency  time  code-organizing  grokkability  grokkability-clarity 
july 2019 by nhaliday
The Existential Risk of Math Errors - Gwern.net
How big is this upper bound? Mathematicians have often made errors in proofs. But it’s rarer for ideas to be accepted for a long time and then rejected. But we can divide errors into 2 basic cases corresponding to type I and type II errors:

1. Mistakes where the theorem is still true, but the proof was incorrect (type I)
2. Mistakes where the theorem was false, and the proof was also necessarily incorrect (type II)

Before someone comes up with a final answer, a mathematician may have many levels of intuition in formulating & working on the problem, but we’ll consider the final end-product where the mathematician feels satisfied that he has solved it. Case 1 is perhaps the most common case, with innumerable examples; this is sometimes due to mistakes in the proof that anyone would accept is a mistake, but many of these cases are due to changing standards of proof. For example, when David Hilbert discovered errors in Euclid’s proofs which no one noticed before, the theorems were still true, and the gaps more due to Hilbert being a modern mathematician thinking in terms of formal systems (which of course Euclid did not think in). (David Hilbert himself turns out to be a useful example of the other kind of error: his famous list of 23 problems was accompanied by definite opinions on the outcome of each problem and sometimes timings, several of which were wrong or questionable5.) Similarly, early calculus used ‘infinitesimals’ which were sometimes treated as being 0 and sometimes treated as an indefinitely small non-zero number; this was incoherent and strictly speaking, practically all of the calculus results were wrong because they relied on an incoherent concept - but of course the results were some of the greatest mathematical work ever conducted6 and when later mathematicians put calculus on a more rigorous footing, they immediately re-derived those results (sometimes with important qualifications), and doubtless as modern math evolves other fields have sometimes needed to go back and clean up the foundations and will in the future.7

...

Isaac Newton, incidentally, gave two proofs of the same solution to a problem in probability, one via enumeration and the other more abstract; the enumeration was correct, but the other proof totally wrong and this was not noticed for a long time, leading Stigler to remark:

...

TYPE I > TYPE II?
“Lefschetz was a purely intuitive mathematician. It was said of him that he had never given a completely correct proof, but had never made a wrong guess either.”
- Gian-Carlo Rota13

Case 2 is disturbing, since it is a case in which we wind up with false beliefs and also false beliefs about our beliefs (we no longer know that we don’t know). Case 2 could lead to extinction.

...

Except, errors do not seem to be evenly & randomly distributed between case 1 and case 2. There seem to be far more case 1s than case 2s, as already mentioned in the early calculus example: far more than 50% of the early calculus results were correct when checked more rigorously. Richard Hamming attributes to Ralph Boas a comment that while editing Mathematical Reviews that “of the new results in the papers reviewed most are true but the corresponding proofs are perhaps half the time plain wrong”.

...

Gian-Carlo Rota gives us an example with Hilbert:

...

Olga labored for three years; it turned out that all mistakes could be corrected without any major changes in the statement of the theorems. There was one exception, a paper Hilbert wrote in his old age, which could not be fixed; it was a purported proof of the continuum hypothesis, you will find it in a volume of the Mathematische Annalen of the early thirties.

...

Leslie Lamport advocates for machine-checked proofs and a more rigorous style of proofs similar to natural deduction, noting a mathematician acquaintance guesses at a broad error rate of 1/329 and that he routinely found mistakes in his own proofs and, worse, believed false conjectures30.

[more on these "structured proofs":
https://academia.stackexchange.com/questions/52435/does-anyone-actually-publish-structured-proofs
https://mathoverflow.net/questions/35727/community-experiences-writing-lamports-structured-proofs
]

We can probably add software to that list: early software engineering work found that, dismayingly, bug rates seem to be simply a function of lines of code, and one would expect diseconomies of scale. So one would expect that in going from the ~4,000 lines of code of the Microsoft DOS operating system kernel to the ~50,000,000 lines of code in Windows Server 2003 (with full systems of applications and libraries being even larger: the comprehensive Debian repository in 2007 contained ~323,551,126 lines of code) that the number of active bugs at any time would be… fairly large. Mathematical software is hopefully better, but practitioners still run into issues (eg Durán et al 2014, Fonseca et al 2017) and I don’t know of any research pinning down how buggy key mathematical systems like Mathematica are or how much published mathematics may be erroneous due to bugs. This general problem led to predictions of doom and spurred much research into automated proof-checking, static analysis, and functional languages31.

[related:
https://mathoverflow.net/questions/11517/computer-algebra-errors
I don't know any interesting bugs in symbolic algebra packages but I know a true, enlightening and entertaining story about something that looked like a bug but wasn't.

Define sinc𝑥=(sin𝑥)/𝑥.

Someone found the following result in an algebra package: ∫∞0𝑑𝑥sinc𝑥=𝜋/2
They then found the following results:

...

So of course when they got:

∫∞0𝑑𝑥sinc𝑥sinc(𝑥/3)sinc(𝑥/5)⋯sinc(𝑥/15)=(467807924713440738696537864469/935615849440640907310521750000)𝜋

hmm:
Which means that nobody knows Fourier analysis nowdays. Very sad and discouraging story... – fedja Jan 29 '10 at 18:47

--

Because the most popular systems are all commercial, they tend to guard their bug database rather closely -- making them public would seriously cut their sales. For example, for the open source project Sage (which is quite young), you can get a list of all the known bugs from this page. 1582 known issues on Feb.16th 2010 (which includes feature requests, problems with documentation, etc).

That is an order of magnitude less than the commercial systems. And it's not because it is better, it is because it is younger and smaller. It might be better, but until SAGE does a lot of analysis (about 40% of CAS bugs are there) and a fancy user interface (another 40%), it is too hard to compare.

I once ran a graduate course whose core topic was studying the fundamental disconnect between the algebraic nature of CAS and the analytic nature of the what it is mostly used for. There are issues of logic -- CASes work more or less in an intensional logic, while most of analysis is stated in a purely extensional fashion. There is no well-defined 'denotational semantics' for expressions-as-functions, which strongly contributes to the deeper bugs in CASes.]

...

Should such widely-believed conjectures as P≠NP or the Riemann hypothesis turn out be false, then because they are assumed by so many existing proofs, a far larger math holocaust would ensue38 - and our previous estimates of error rates will turn out to have been substantial underestimates. But it may be a cloud with a silver lining, if it doesn’t come at a time of danger.

https://mathoverflow.net/questions/338607/why-doesnt-mathematics-collapse-down-even-though-humans-quite-often-make-mista

more on formal methods in programming:
https://www.quantamagazine.org/formal-verification-creates-hacker-proof-code-20160920/
https://intelligence.org/2014/03/02/bob-constable/

https://softwareengineering.stackexchange.com/questions/375342/what-are-the-barriers-that-prevent-widespread-adoption-of-formal-methods
Update: measured effort
In the October 2018 issue of Communications of the ACM there is an interesting article about Formally verified software in the real world with some estimates of the effort.

Interestingly (based on OS development for military equipment), it seems that producing formally proved software requires 3.3 times more effort than with traditional engineering techniques. So it's really costly.

On the other hand, it requires 2.3 times less effort to get high security software this way than with traditionally engineered software if you add the effort to make such software certified at a high security level (EAL 7). So if you have high reliability or security requirements there is definitively a business case for going formal.

WHY DON'T PEOPLE USE FORMAL METHODS?: https://www.hillelwayne.com/post/why-dont-people-use-formal-methods/
You can see examples of how all of these look at Let’s Prove Leftpad. HOL4 and Isabelle are good examples of “independent theorem” specs, SPARK and Dafny have “embedded assertion” specs, and Coq and Agda have “dependent type” specs.6

If you squint a bit it looks like these three forms of code spec map to the three main domains of automated correctness checking: tests, contracts, and types. This is not a coincidence. Correctness is a spectrum, and formal verification is one extreme of that spectrum. As we reduce the rigour (and effort) of our verification we get simpler and narrower checks, whether that means limiting the explored state space, using weaker types, or pushing verification to the runtime. Any means of total specification then becomes a means of partial specification, and vice versa: many consider Cleanroom a formal verification technique, which primarily works by pushing code review far beyond what’s humanly possible.

...

The question, then: “is 90/95/99% correct significantly cheaper than 100% correct?” The answer is very yes. We all are comfortable saying that a codebase we’ve well-tested and well-typed is mostly correct modulo a few fixes in prod, and we’re even writing more than four lines of code a day. In fact, the vast… [more]
ratty  gwern  analysis  essay  realness  truth  correctness  reason  philosophy  math  proofs  formal-methods  cs  programming  engineering  worse-is-better/the-right-thing  intuition  giants  old-anglo  error  street-fighting  heuristic  zooming  risk  threat-modeling  software  lens  logic  inference  physics  differential  geometry  estimate  distribution  robust  speculation  nonlinearity  cost-benefit  convexity-curvature  measure  scale  trivia  cocktail  history  early-modern  europe  math.CA  rigor  news  org:mag  org:sci  miri-cfar  pdf  thesis  comparison  examples  org:junk  q-n-a  stackex  pragmatic  tradeoffs  cracker-prog  techtariat  invariance  DSL  chart  ecosystem  grokkability  heavyweights  CAS  static-dynamic  lower-bounds  complexity  tcs  open-problems  big-surf  ideas  certificates-recognition  proof-systems  PCP  mediterranean  SDP  meta:prediction  epistemic  questions  guessing  distributed  overflow  nibble  soft-question  track-record  big-list  hmm  frontier  state-of-art  move-fast-(and-break-things)  grokkability-clarity  technical-writing  trust 
july 2019 by nhaliday
python - Executing multi-line statements in the one-line command-line? - Stack Overflow
you could do
> echo -e "import sys\nfor r in range(10): print 'rob'" | python
or w/out pipes:
> python -c "exec(\"import sys\nfor r in range(10): print 'rob'\")"
or
> (echo "import sys" ; echo "for r in range(10): print 'rob'") | python

[ed.: In fish
> python -c "import sys"\n"for r in range(10): print 'rob'"]
q-n-a  stackex  programming  yak-shaving  pls  python  howto  terminal  parsimony  syntax  gotchas 
july 2019 by nhaliday
Why is Google Translate so bad for Latin? A longish answer. : latin
hmm:
> All it does its correlate sequences of up to five consecutive words in texts that have been manually translated into two or more languages.
That sort of system ought to be perfect for a dead language, though. Dump all the Cicero, Livy, Lucretius, Vergil, and Oxford Latin Course into a database and we're good.

We're not exactly inundated with brand new Latin to translate.
--
> Dump all the Cicero, Livy, Lucretius, Vergil, and Oxford Latin Course into a database and we're good.
What makes you think that the Google folks haven't done so and used that to create the language models they use?
> That sort of system ought to be perfect for a dead language, though.
Perhaps. But it will be bad at translating novel English sentences to Latin.
foreign-lang  reddit  social  discussion  language  the-classics  literature  dataset  measurement  roots  traces  syntax  anglo  nlp  stackex  links  q-n-a  linguistics  lexical  deep-learning  sequential  hmm  project  arrows  generalization  state-of-art  apollonian-dionysian  machine-learning  google 
june 2019 by nhaliday
paradigms - What's your strongest opinion against functional programming? - Software Engineering Stack Exchange
The problem is that most common code inherently involves state -- business apps, games, UI, etc. There's no problem with some parts of an app being purely functional; in fact most apps could benefit in at least one area. But forcing the paradigm all over the place feels counter-intuitive.
q-n-a  stackex  programming  engineering  pls  functional  pragmatic  cost-benefit  rhetoric  debate  steel-man  business  regularizer  abstraction  state  realness 
june 2019 by nhaliday
c++ - Which is faster: Stack allocation or Heap allocation - Stack Overflow
On my machine, using g++ 3.4.4 on Windows, I get "0 clock ticks" for both stack and heap allocation for anything less than 100000 allocations, and even then I get "0 clock ticks" for stack allocation and "15 clock ticks" for heap allocation. When I measure 10,000,000 allocations, stack allocation takes 31 clock ticks and heap allocation takes 1562 clock ticks.

so maybe around 100x difference? what does that work out to in terms of total workload?

hmm:
http://vlsiarch.eecs.harvard.edu/wp-content/uploads/2017/02/asplos17mallacc.pdf
Recent work shows that dynamic memory allocation consumes nearly 7% of all cycles in Google datacenters.

That's not too bad actually. Seems like I shouldn't worry about shifting from heap to stack/globals unless profiling says it's important, particularly for non-oly stuff.

edit: Actually, factor x100 for 7% is pretty high, could be increase constant factor by almost an order of magnitude.

edit: Well actually that's not the right math. 93% + 7%*.01 is not much smaller than 100%
q-n-a  stackex  programming  c(pp)  systems  memory-management  performance  intricacy  comparison  benchmarks  data  objektbuch  empirical  google  papers  nibble  time  measure  pro-rata  distribution  multi  pdf  oly-programming  computer-memory 
june 2019 by nhaliday
c++ - Constexpr Math Functions - Stack Overflow
Actually, because of old and annoying legacy, almost none of the math functions can be constexpr, since they all have the side-effect of setting errno on various error conditions, usually domain errors.
--
Note, gcc has implemented most of the math function as constexpr although the extension is non-conforming this should change. So definitely doable. – Shafik Yaghmour Jan 12 '15 at 20:2
q-n-a  stackex  programming  pls  c(pp)  gotchas  legacy  numerics  state  resources-effects  gnu 
june 2019 by nhaliday
c++ - Why is size_t unsigned? - Stack Overflow
size_t is unsigned for historical reasons.

On an architecture with 16 bit pointers, such as the "small" model DOS programming, it would be impractical to limit strings to 32 KB.

For this reason, the C standard requires (via required ranges) ptrdiff_t, the signed counterpart to size_t and the result type of pointer difference, to be effectively 17 bits.

Those reasons can still apply in parts of the embedded programming world.

However, they do not apply to modern 32-bit or 64-bit programming, where a much more important consideration is that the unfortunate implicit conversion rules of C and C++ make unsigned types into bug attractors, when they're used for numbers (and hence, arithmetical operations and magnitude comparisions). With 20-20 hindsight we can now see that the decision to adopt those particular conversion rules, where e.g. string( "Hi" ).length() < -3 is practically guaranteed, was rather silly and impractical. However, that decision means that in modern programming, adopting unsigned types for numbers has severe disadvantages and no advantages – except for satisfying the feelings of those who find unsigned to be a self-descriptive type name, and fail to think of typedef int MyType.

Summing up, it was not a mistake. It was a decision for then very rational, practical programming reasons. It had nothing to do with transferring expectations from bounds-checked languages like Pascal to C++ (which is a fallacy, but a very very common one, even if some of those who do it have never heard of Pascal).
q-n-a  stackex  c(pp)  systems  embedded  hardware  measure  types  signum  gotchas  roots  explanans  pls  programming 
june 2019 by nhaliday
linux - How do I insert a tab character in Iterm? - Stack Overflow
However, this isn't an iTerm thing, this is your shell that's doing it.

ctrl-V for inserting nonprintable literals doesn't work in fish, neither in vi mode nor emacs mode. prob easiest to just switch to bash.
q-n-a  stackex  terminal  yak-shaving  gotchas  keyboard  tip-of-tongue  strings 
june 2019 by nhaliday
« earlier      
per page:    204080120160

Copy this bookmark:





to read