recentpopularlog in

ianweatherhogg : hadoop   121

« earlier  
Uetke | How To Use Hdf5 Files In Python
Learn how to use the HDF5 format to store large amounts of data and read it back with Python
python  hadoop  hdf5 
march 2018 by ianweatherhogg
Native Hadoop file system (HDFS) connectivity in Python - Wes McKinney
There have been many Python libraries developed for interactive with the Hadoop File System, HDFS, via its WebHDFS gateway as well as its native Protocol Buffers-based RPC interface. I'll give you an overview of what's out there and show some engineering I've been doing to offer a high performance HDFS interface within the developing Arrow ecosystem. This blog is a follow up to my 2017 Roadmap post.
python  hadoop  4* 
january 2017 by ianweatherhogg
Quantisan/docker-hadoop-v1: Docker image for Hadoop v1.0.3 running in pseudo-distributed mode
docker-hadoop-v1 - Docker image for Hadoop v1.0.3 running in pseudo-distributed mode
docker  hadoop 
march 2016 by ianweatherhogg
# Finding mutual friends with OpenMPI and map-reduce
Embedded devices such as RaspberryPI don't have enough power for running hadoop jobs. It is a really complex peace of software. Instead of trying to adapt it I decided to use more "lightweight" solution which is OpenMPI map-reduce. MPI was designed for distributed computations, so why not run map-reduce framework on it? Let's do it!
open  mpi  cluster  hadoop 
february 2016 by ianweatherhogg
Install a Multi Node Hadoop Cluster on Ubuntu 14.04 – Sumit Chawla's Blog
This article is about multi-node installation of Hadoop cluster.  You would need minimum of 2 ubuntu machines or virtual images to complete a multi-node installation.  If you want to just try out a single node cluster, follow this article on Installing Hadoop on Ubuntu 14.04. I used Hadoop Stable version 2.6.0 for this article. I did this…
hadoop  master  slave  cluster  installation  4* 
february 2016 by ianweatherhogg
How to Build a Scalable ETL Pipeline with Kafka Connect
A tutorial on how to use Kafka Connect, together with the JDBC and HDFS connectors, to build a scalable data pipeline in 30 minutes.
kafka  mysql  hadoop  hive 
february 2016 by ianweatherhogg
Blog Archive - Random Thoughts on Coding
Blog Archive 2015 Spark and Guava Tables
Oct 09 2015 posted in Hadoop, MapReduce, Scala, Spark Secondary Sorting in Spark
Oct 02 2015 posted in …
blogs  hadoop  spark  kafka  stream  processing  guava 
october 2015 by ianweatherhogg
Process Small Files on Hadoop using CombineFileInputFormat (2) - Carpe diem (Felix's blog)
Followed the previous article, in this post I ran several benchmarks and tuned the performance from 3 hours 34 minutes to 6 minutes 8 seconds! …
may 2014 by ianweatherhogg
hadoop-ansible - Ansible playbook that installs a Hadoop cluster, with HBase, Hive, Presto for analytics, and Ganglia, Smokeping, Fluentd, Elasticsearch and Kibana for monitoring and centralized log indexing.
github  hadoop  ansible  play  book 
may 2014 by ianweatherhogg
ZPark-Ztream: Driving Spark distributed stream with Scalaz-Stream - Mandubian Blog
ZPark-Ztream: Driving Spark Distributed Stream With Scalaz-Stream

Feb 13th, 2014 …
spark  scalaz  stream  hadoop  processing  parallel  4*  map  reduce 
february 2014 by ianweatherhogg
Building an Hadoop 0.20.x version for HBase 0.90.2 - Michael G. Noll
How to compile an Hadoop 0.20.x version with HDFS append support that is compatible with HBase 0.90.x
hadoop  hbase 
november 2013 by ianweatherhogg
Using Avro in MapReduce jobs with Hadoop, Pig, Hive - Michael G. Noll
Example MapReduce jobs in Java, Hadoop Streaming, Pig and Hive that read and/or write data in Avro format.
hadoop  java  stream  4*  avro 
november 2013 by ianweatherhogg
« earlier      
per page:    204080120160

Copy this bookmark:

to read