recentpopularlog in

ianweatherhogg : spark   115

« earlier  
Testing Topologies in Kafka Streams
Kafka Streams is a deployment-agnostic stream processing library written in Java. Even though Kafka has a great test coverage, there is no helper code for writing unit-tests for your Kafka Streams topologies. I wrote a little helper library Mocked Streams in Scala, which allows you to create lightweight parallelizable unit-tests for your topologies without running a full Kafka cluster neither an embedded one.
spark  kafka  stream  processing  test  topology 
february 2017 by ianweatherhogg
Stateful Streaming in Spark and Kafka Streams
This article is about aggregates in stateful stream processing. It covers two concrete examples in Apache Spark and Apache Kafka.
spark  kafka  stream  processing  analytics 
february 2017 by ianweatherhogg
Event Tracking with Finatra and Spark
In this article I will give a beginners guide to write an event tracking API with Finatra and Spark.
finatra  spark  scala 
february 2017 by ianweatherhogg
Processing Tweets with Kafka Streams
I am going to develop an example application which consists of an ingesting service which is getting data from Twitter and an aggregation service which uses Kafka Streams to aggregate word counts in tumbling time windows.
twitter  scala  spark  stream  processing 
february 2017 by ianweatherhogg
Location, Location, Location
At OpenSignal, we’re dedicated to help people find better signal. Our crowdsourced app has been downloaded by over 20 million users worldwide and they’ve been using it to find the best network…
scala  spark  python  geo 
february 2017 by ianweatherhogg
PiercingDan/spark-Jupyter-AWS: A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, supporting S3 I/O
spark-Jupyter-AWS - A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, supporting S3 I/O
github  amazon  cloud  spark  python  data 
december 2016 by ianweatherhogg
Using Amazon Elastic Map Reduce (EMR) with Spark and Python 3.4
As part of a recent HumanGeo effort, I was faced with the challenge of detecting patterns and anomalies in large geospatial datasets using various statistics...
amazon  cloud  spark  python 
august 2016 by ianweatherhogg
A Billion Taxi Rides on Amazon EMR running Spark
Benchmarks & Tips for Big Data, Hadoop, AWS, Google Cloud, Postgres, Spark, Python & More...
spark  amazon  elastic  map  reduce  configuration  hive  big  data  cloud 
april 2016 by ianweatherhogg
Setup a Spark Cluster in 5 minutes - sqldump
Prerequisites Assuming you have 3 nodes: node1, node2, node3, ensure the hosts file contains the following entries on all nodes: 1
3 …
spark  cluster  host  master  slave 
march 2016 by ianweatherhogg
Databricks Blog
Founded by the creators of Apache Spark, Databricks works to make big data simple through our hosted big data service, Databricks.
blogs  spark 
february 2016 by ianweatherhogg
Haskell meets large scale distributed analytics
Haskell meets large scale distributed analytics: sparkle, a Haskell API for Apache Spark
haskell  spark 
february 2016 by ianweatherhogg
training/ at ampcamp6 · amplab/training
training - Training materials for Strata, AMP Camp, etc
spark  r  helloworld 
february 2016 by ianweatherhogg
Spark and Spark Streaming Unit Testing - Passionate Developer
When you develop distributed system, it is crucial to make it easy to test.
Execute tests in controlled environment, ideally from your IDE.
Long …
spark  stream  test  kafka 
february 2016 by ianweatherhogg
GraphX - Spark 1.6.0 Documentation
GraphX graph processing library guide for Spark 1.6.0
spark  graph  x  documentation 
february 2016 by ianweatherhogg
MLlib - Spark 1.6.0 Documentation
MLlib machine learning library overview for Spark 1.6.0
spark  machine  learn  documentation 
february 2016 by ianweatherhogg
Introduction | Databricks Spark Reference Applications
Reference Applications demonstrating Apache Spark - brought to you by Databricks.
book  free  spark  sql  5* 
february 2016 by ianweatherhogg
Scala for machine learning
A blog about Scala programming language and machine learning
blogs  scala  spark  machine  learn  4* 
february 2016 by ianweatherhogg
8. PySpark · killrweather/killrweather Wiki
killrweather - KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka for fast, streaming computations on time series data in asynchronous event-driven environments.
python  spark  wiki 
february 2016 by ianweatherhogg
Audience Modeling With Spark ML Pipelines - Eugene Zhulenev
At Collective we are heavily relying on machine learning and predictive modeling to run digital advertising business. All decisions about what ad to …
scala  spark  machine  learn  pipe  line 
february 2016 by ianweatherhogg
Spark and Kafka Integration Patterns, Part 2 - Passionate Developer
In the world beyond batch,
streaming data processing is a future of dig data.
Despite of the streaming framework using for data processing, tight …
kafka  spark 
february 2016 by ianweatherhogg
Ethan's Tech Blog | Serializing Generic Types with Spray JSON Library
Continuing from the last post, this is how you can take generic base classes and allow the serialization code to convert them to the appropriate subclass.
spark  json 
january 2016 by ianweatherhogg
Kenny Bastani
A blog about innovation, graph theory, computer science, programming, information theory, artificial intelligence, and machine learning.
blogs  spark 
january 2016 by ianweatherhogg
Introduction | Mastering Apache Spark
Loose notes about Apache Spark from my journey into the depths of it (aka towards mastery of Apache Spark)
spark  book  free 
january 2016 by ianweatherhogg
Blog Archive - Random Thoughts on Coding
Blog Archive 2015 Spark and Guava Tables
Oct 09 2015 posted in Hadoop, MapReduce, Scala, Spark Secondary Sorting in Spark
Oct 02 2015 posted in …
blogs  hadoop  spark  kafka  stream  processing  guava 
october 2015 by ianweatherhogg
Spark Streaming - Spark 1.4.1 Documentation
Spark Streaming programming guide and tutorial for Spark 1.4.1
spark  stream  documentation 
september 2015 by ianweatherhogg
Spark On: Let’s Code! (Part 1) | 47 Degrees
Spark and Scala have come to enhance large-scale data processing. In this series of blog posts we will talk about some of our own experiences in Spark.
spark  twitter  stream  akka 
september 2015 by ianweatherhogg
Tutorial: Spark-GPU Cluster Dev in a Notebook - i am trask
Write your site description here. It will be used as your sites meta description as well!
ipython  note  book  gpu  cluster  spark 
july 2015 by ianweatherhogg
i am trask
Write your site description here. It will be used as your sites meta description as well!
python  neural  network  machine  learn  numpy  spark  gpu  cluster 
july 2015 by ianweatherhogg
Starting with Spark in practice
This post aims to quickly recap basics about Apache Spark and describes exercises (Spark core & streaming, dataframe) to get started with Spark in practice.
spark  helloworld 
july 2015 by ianweatherhogg
Isotonic regression implementation in Apache Spark
Implementation of isotonic regression using parallel pool adjacent violators algorithm in Apache Spark and MLlib.
july 2015 by ianweatherhogg
Intro to Spark - The Lapidary Lemur
Assuming you’ve read the first article on Functional Programming in Scala and Python, you should be ready to sink your teeth into a few …
spark  helloworld 
june 2015 by ianweatherhogg
A Functional Programming Primer for Spark - The Lapidary Lemur
There’s a lot of hype around Spark and Big Data in general, especially around the concepts of Functional Programming. Problem is, Functional …
spark  helloworld 
june 2015 by ianweatherhogg
Blog Archive - Mastering FP and OO with Scala
Blog Archive 2015 Why Docker - Writing Docs Using Jekyll
Aug 17 2015 posted in docker Docker Your Scala Web Application (Play Framework)
Jul 24 2015 …
blogs  scala  docker  spark  play  4* 
may 2015 by ianweatherhogg
Project Tungsten: Bringing Spark Closer to Bare Metal | Databricks blog
Project Tungsten focuses on improving the performance of Spark applications, pushing it closer to the limits of hardware.
spark  big  data  java 
may 2015 by ianweatherhogg
« earlier      
per page:    204080120160

Copy this bookmark:

to read