recentpopularlog in


« earlier   
Taco Bell Programming
Here's a concrete example: suppose you have millions of web pages that you want to download and save to disk for later processing. How do you do it? The cool-kids answer is to write a distributed crawler in Clojure and run it on EC2, handing out jobs with a message queue like SQS or ZeroMQ.

The Taco Bell answer? xargs and wget. In the rare case that you saturate the network connection, add some split and rsync. A "distributed crawler" is really only like 10 lines of shell script.

Moving on, once you have these millions of pages (or even tens of millions), how do you process them? Surely, Hadoop MapReduce is necessary, after all, that's what Google uses to parse the web, right?

Pfft, fuck that noise:

find crawl_dir/ -type f -print0 | xargs -n1 -0 -P32 ./process
32 concurrent parallel parsing processes and zero bullshit to manage. Requirement satisfied.

Every time you write code or introduce third-party services, you are introducing the possibility of failure into your system. I have far more faith in xargs than I do in Hadoop. Hell, I trust xargs more than I trust myself to write a simple multithreaded processor. I trust syslog to handle asynchronous message recording far more than I trust a message queue service.
programming  productivity  unix  linux 
3 hours ago by hellsten | Inbox your life
Shared Inbox, Email productivity, Kanban in email
organization  productivity  gmail 
4 days ago by jpinnix
Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.
Outil super flexible pour créer des listes de tâches, des boards, CRM etc.
productivity  tools 
4 days ago by alssanro
Perch CMS Expansions for Typinator on Vimeo
Typinator is a text expansion tool for Mac. Here's a a screencast demo of a Typinator Perch CMS Expansion set.
perch  cms  code  shortcuts  text  tools  typinator  mac  productivity 
4 days ago by abberdab
Move and resize windows with ease
app  mac  osx  productivity  software 
4 days ago by chrismasters

Copy this bookmark:

to read