recentpopularlog in

kme : datascience   55

Useful Unix commands for data science
via: http://johnkerl.org/miller/doc/originality.html
Imagine you have a 4.2GB CSV file. It has over 12 million records and 50 columns. All you need from this file is the sum of all values in one particular column.


OK, but I'd mention the useless use of 'cat' to anyone learning from this guide. Alternatives:
<code class="language-bash">
<data.csv awk -F "|" '{ sum += $4 } END { printf "%.2f\n", sum }'
awk -F "|" '{ sum += $4 } END { printf "%.2f\n", sum }' data.csv
</code>
unix  textprocessing  datascience  commandline  reference  newbie 
7 weeks ago by kme
Building a data science portfolio: Machine learning project
Another reason why you should wrap your READMEs and code at <80 columns.
datascience  python  machinelearning  80columns 
july 2016 by kme

Copy this bookmark:





to read