recentpopularlog in

jabley : benchmark   147

« earlier  
[1806.00680] Datacenter RPCs can be General and Fast
It is commonly believed that datacenter networking software must sacrifice generality to attain high performance. The popularity of specialized distributed systems designed specifically for niche technologies such as RDMA, lossless networks, FPGAs, and programmable switches testifies to this belief. In this paper, we show that such specialization is not necessary. eRPC is a new general-purpose remote procedure call (RPC) library that offers performance comparable to specialized systems, while running on commodity CPUs in traditional datacenter networks based on either lossy Ethernet or lossless fabrics. eRPC performs well in three key metrics: message rate for small messages; bandwidth for large messages; and scalability to a large number of nodes and CPU cores. It handles packet loss, congestion, and background request execution. In microbenchmarks, one CPU core can handle up to 10 million small RPCs per second, or send large messages at 75 Gbps. We port a production-grade implementation of Raft state machine replication to eRPC without modifying the core Raft source code. We achieve 5.5 microseconds of replication latency on lossy Ethernet, which is faster than or comparable to specialized replication systems that use programmable switches, FPGAs, or RDMA.
datacenter  networking  performance  benchmark  comp-sci  research  paper 
february 2019 by jabley
Scalability! But at what COST?
We offer a new metric for big data platforms, COST,
or the Configuration that Outperforms a Single Thread.
The COST of a given platform for a given problem is the
hardware configuration required before the platform outperforms
a competent single-threaded implementation.
COST weighs a system’s scalability against the overheads
introduced by the system, and indicates the actual
performance gains of the system, without rewarding systems
that bring substantial but parallelizable overheads.
We survey measurements of data-parallel systems recently
reported in SOSP and OSDI, and find that many
systems have either a surprisingly large COST, often
hundreds of cores, or simply underperform one thread
for all of their reported configurations.
benchmark  coding  performance  big-data  scalability  paper  filetype:pdf  economics 
april 2018 by jabley
« earlier      
per page:    204080120160

Copy this bookmark:

to read