recentpopularlog in

jabley : performance   826

« earlier  
The Unwritten Contract of Solid State Drives
We perform a detailed vertical analysis of application performance atop a range of modern file systems and SSD FTLs.
We formalize the “unwritten contract” that clients of SSDs
should follow to obtain high performance, and conduct our
analysis to uncover application and file system designs that
violate the contract. Our analysis, which utilizes a highly
detailed SSD simulation underneath traces taken from real
workloads and file systems, provides insight into how to better construct applications, file systems, and FTLs to realize
robust and sustainable performance.
ssd  filetype:pdf  paper  comp-sci  disk  performance  research 
29 days ago by jabley
Overcoming the challenges to feedback-directed optimization (Keynote Talk)
Feedback-directed optimization (FDO) is a general term used to describe any technique that alters a program's execution based on tendencies observed in its present or past runs. This paper reviews the current state of affairs in FDO and discusses the challenges inhibiting further acceptance of these techniques. It also argues that current trends in hardware and software technology have resulted in an execution environment where immutable executables and traditional static optimizations are no longer sufficient. It explains how we can improve the effectiveness of our optimizers by increasing our understanding of program behavior, and it provides examples of temporal behavior that we can (or could in the future) exploit during optimization.
paper  comp-sci  compilers  optimisation  performance 
may 2019 by jabley
A fork() in the road
The received wisdom suggests that Unix’s unusual combination of fork() and exec() for process creation was an
inspired design. In this paper, we argue that fork was a clever
hack for machines and programs of the 1970s that has long
outlived its usefulness and is now a liability. We catalog the
ways in which fork is a terrible abstraction for the modern programmer to use, describe how it compromises OS
implementations, and propose alternatives.
As the designers and implementers of operating systems,
we should acknowledge that fork’s continued existence as
a first-class OS primitive holds back systems research, and
deprecate it. As educators, we should teach fork as a historical artifact, and not the first process creation mechanism
students encounter.
filetype:pdf  unix  os  design  fork  memory  safety  performance 
april 2019 by jabley
Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems
Multi-Version Concurrency Control (MVCC) is a widely employed concurrency control mechanism, as it allows for execution modes where readers never block writers. However,
most systems implement only snapshot isolation (SI) instead
of full serializability. Adding serializability guarantees to existing SI implementations tends to be prohibitively expensive.
We present a novel MVCC implementation for main-memory database systems that has very little overhead compared
to serial execution with single-version concurrency control,
even when maintaining serializability guarantees. Updating
data in-place and storing versions as before-image deltas in
undo buffers not only allows us to retain the high scan performance of single-version systems but also forms the basis of our cheap and fine-grained serializability validation
mechanism. The novel idea is based on an adaptation of
precision locking and verifies that the (extensional) writes
of recently committed transactions do not intersect with the
(intensional) read predicate space of a committing transaction. We experimentally show that our MVCC model allows
very fast processing of transactions with point accesses as
well as read-heavy transactions and that there is little need
to prefer SI over full serializability any longer.
comp-sci  database  mvcc  performance  design  architecture  filetype:pdf  paper  serialisability  linearisability 
march 2019 by jabley
[1806.00680] Datacenter RPCs can be General and Fast
It is commonly believed that datacenter networking software must sacrifice generality to attain high performance. The popularity of specialized distributed systems designed specifically for niche technologies such as RDMA, lossless networks, FPGAs, and programmable switches testifies to this belief. In this paper, we show that such specialization is not necessary. eRPC is a new general-purpose remote procedure call (RPC) library that offers performance comparable to specialized systems, while running on commodity CPUs in traditional datacenter networks based on either lossy Ethernet or lossless fabrics. eRPC performs well in three key metrics: message rate for small messages; bandwidth for large messages; and scalability to a large number of nodes and CPU cores. It handles packet loss, congestion, and background request execution. In microbenchmarks, one CPU core can handle up to 10 million small RPCs per second, or send large messages at 75 Gbps. We port a production-grade implementation of Raft state machine replication to eRPC without modifying the core Raft source code. We achieve 5.5 microseconds of replication latency on lossy Ethernet, which is faster than or comparable to specialized replication systems that use programmable switches, FPGAs, or RDMA.
datacenter  networking  performance  benchmark  comp-sci  research  paper 
february 2019 by jabley
Golang’s Garbage
Looking at the performance cost of Go's GC trade-offs
slides  presentation  golang  gc  performance  filetype:pdf 
october 2018 by jabley
Putting the “Micro” Back in Microservice
Modern cloud computing environments strive to provide
users with fine-grained scheduling and accounting, as well
as seamless scalability. The most recent face to this trend
is the “serverless” model, in which individual functions,
or microservices, are executed on demand. Popular implementations
of this model, however, operate at a relatively
coarse granularity, occupying resources for minutes at a
time and requiring hundreds of milliseconds for a cold
launch. In this paper, we describe a novel design for
providing “functions as a service” (FaaS) that attempts
to be truly micro: cold launch times in microseconds
that enable even finer-grained resource accounting and
support latency-critical applications. Our proposal is
to eschew much of the traditional serverless infrastructure
in favor of language-based isolation. The result is
microsecond-granularity launch latency, and microsecondscale
preemptive scheduling using high-precision timers.
filetype:pdf  paper  comp-sci  rust  serverless  faas  performance 
september 2018 by jabley
Scalability! But at what COST?
We offer a new metric for big data platforms, COST,
or the Configuration that Outperforms a Single Thread.
The COST of a given platform for a given problem is the
hardware configuration required before the platform outperforms
a competent single-threaded implementation.
COST weighs a system’s scalability against the overheads
introduced by the system, and indicates the actual
performance gains of the system, without rewarding systems
that bring substantial but parallelizable overheads.
We survey measurements of data-parallel systems recently
reported in SOSP and OSDI, and find that many
systems have either a surprisingly large COST, often
hundreds of cores, or simply underperform one thread
for all of their reported configurations.
benchmark  coding  performance  big-data  scalability  paper  filetype:pdf  economics 
april 2018 by jabley
« earlier      
per page:    204080120160

Copy this bookmark:





to read