Jekyll2018-10-23T23:50:02+00:00http://kostyukov.net/Vladimir KostyukovPosts / SlidesVladimir KostyukovFinch Performance Lessons2017-03-01T09:00:00+00:002017-03-01T09:00:00+00:00http://kostyukov.net/posts/finch-performance-lessons<p>With respect to <a href="http://kostyukov.net/posts/finagle-101/">a good tradition</a>, I’m publishing a blog post based on my <a href="http://ostyukov.net/slides/finch-tokyo/">ScalaMatsuri talk</a> covering a few performance lessons I learned working on <a href="https://github.com/finagle/finch">Finch</a>. Even though these lessons are coming from a foundation library, I tend to believe they can be projected into any Scala (or even Java) codebase no matter if it’s an HTTP library or a user-facing application.</p>
<h3 id="10">10%</h3>
<p>When it comes to throughput, <a href="http://kostyukov.net/posts/how-fast-is-finch/">Finch has been doing quite well</a> compared to other Scala HTTP libraries. However, the absolute metric like QPS isn’t necessary the most interesting one. Historically, Finch has been measuring its performance (the throughput) in terms of overhead it introduces on top of Finagle, its IO layer.</p>
<p>We’ve been constantly measuring this kind of overhead locally, running a <code class="highlighter-rouge">wrk</code> load test against two different instances: Finch and <a href="https://github.com/circe/circe">Circe</a> vs. Finagle and Jackson. This way we’re not only measuring Finch’s overhead, but also comparing two ends of the spectrum: type-level vs. runtime-level solutions.</p>
<p>Even though we’ve been doing this comparison for a while, no third party has ever tried to reproduce it until this <a href="https://www.techempower.com/benchmarks/#section=data-r13&hw=ph&test=json&l=4ftbsv">very recent round of the TechEmpower benchmark</a>.</p>
<p><img src="/images/finch-performance-lessons/framework-overhead.png" style="width: 800px;" /></p>
<p>According to the “Framework Overhead” (see above) comparison table, Finch performs on about 90% of Finagle’s throughput making it only/as much as 10% in terms overhead. Obviously, this overhead will hardly be noticeable for most of the services and yet could easily be a deal breaker for under-provisioned applications. This is why it’s important to understand from where these 10% are coming from and see if there is a way to reduce the gap.</p>
<h3 id="why-only-10">Why only 10%?</h3>
<p>In a typical Finch application lots of work is happening at compile time (think of Circe’s generic derivation and Shapeless’ machinery empowering endpoints). Turns out it might be a good deal to trade of complication time for (usually) safer and cheaper runtime.</p>
<p>As far as the JSON encoding/decoding goes, instead of inspecting each object at runtime to figure out how to properly encode/decode it, Circe’s encoders/decoders are already materialized once program compiled hence are ready for use at runtime. This not only makes codecs safer such that it’s known at compile time if they weren’t derived properly but also cheaper to use.</p>
<p>Besides taking an advantage of compile-time derived codecs for JSON, Finch is always trying to get the most out of the setup it’s currently running on. This includes using the most efficient decoding/encoding strategy depending on what JSON library is wired in and/or what Netty version is Finagle using at the moment.</p>
<h3 id="why-as-much-as-10">Why as much as 10%?</h3>
<p>It’s not a surprise that composition comes with a cost in a form of allocations and it’s somewhat fundamental. In the object-oriented setting, a new behavior is usually implemented as a class extending some old behavior. This way it’s possible to access a newly introduced logic by just instantiating this new class with a penalty of a single allocation.</p>
<p>In the functional (or even purely functional) setting, on the other hand, in order to introduce a new behavior to an existing entity, the later should be instantiated anyway before it gets composed with some other entity making the changes. This literally doubles the number of allocations used in a program promoting composition over inheritance.</p>
<h3 id="allocations-matter">Allocations Matter</h3>
<p>High allocation rate itself isn’t a big problem on modern JVMs, and in fact, quite often is mitigated by JIT automatically moving some allocations on stack. However, this only possible for “local” and short-lived allocations that are never tenured. While it’s nearly impossible to tell what allocations will be eliminated (stack-allocated) by just looking at the source code, a good rule of thumb would be always consider the worst outcome and pretend that all allocations will go on heap and will live long enough to be compacted (copied over to prevent heap fragmentation). In other words, it’s generally a good idea to avoid all sorts of allocations as long as it doesn’t regress the throughput. The bottom line is lower allocation rate will often pay back with less frequent and shorter GC pauses.</p>
<p>Allocation profiles are even more important for the foundation libraries like Finch or Circe. Libraries that are always placed on application’s hottest path, transforming a request into an appropriate response. Just a hundred of bytes allocated on the request/response path will sum up into several hundred megabytes of memory allocated and deallocated every second once service’s QPS hits reasonably high numbers.</p>
<h3 id="allocating-less">Allocating Less</h3>
<p>Paying attention to the allocation profile is one of the essential skills needed for writing GC-friendly programs. Even though Scala code is quite hard to reason about performance-wise, it’s mostly easy to guestimate how many bytes a given code structure is going to allocate. It just requires some practice made up of experiments with JMH’s <code class="highlighter-rouge">-prof gc</code> mode.</p>
<p>While this byte counting business may seem too low-level and possibly worthless, the idea of allocating less scales really well from the scope of a single function to the entire program. We’ll see later in this post how both local and global optimizations of the allocation profile, recently made in Finch, help to improve the throughput.</p>
<h3 id="composing-less">Composing Less</h3>
<p>Intuitively, one way of saving allocations is giving up on composition. This doesn’t necessary mean dropping down to the object-oriented concepts in the user-facing API but rather revisiting the internal structures along with making sure they are not engaging composition/allocations. Even though, adopting ideas like inheritance, mutable arrays, and <code class="highlighter-rouge">while</code> loops considered impure, it’s quite a popular trade-off to make in the foundation libraries including those promoting a purely-functional API.</p>
<p>When it comes to modeling an endpoint’s result, Finch has been using the type alias to an option indicating whether or not the endpoint was matched on a given input.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">type</span> <span class="kt">EndpointResult</span><span class="o">[</span><span class="kt">A</span><span class="o">]</span> <span class="k">=</span> <span class="nc">Option</span><span class="o">[(</span><span class="kt">Input</span>, <span class="kt">Future</span><span class="o">[</span><span class="kt">Output</span><span class="o">[</span><span class="kt">A</span><span class="o">]])]</span></code></pre></figure>
<p>This worked perfectly well in the past given how idiomatic and easy to reason about it is. However, this introduces an unnecessary overhead coming from an abstraction that’s way more powerful than needed. As any other abstraction from a standard library, <code class="highlighter-rouge">Option</code> comes with a variety of combinators that promote an idiomatic usage pattern based on either for-comprehension or <code class="highlighter-rouge">map</code> and <code class="highlighter-rouge">flatMap</code> variants. There is absolutely nothing wrong with those functions except for the cost they’re coming with, which could be a deal breaker for performance-critical abstractions. Most of the time, mapping an option means allocating a closure that depending on the number of arguments may be quite expensive.</p>
<p>In addition to that, on the successful path (when an endpoint is being matched and the result is being returned), it requires two allocations to get the result out of the door: one for the inner <code class="highlighter-rouge">Tuple2</code>, one for the outer <code class="highlighter-rouge">Option</code>.</p>
<p>An alternative to the <code class="highlighter-rouge">Option</code> solution would be to hand-roll our own abstraction that basically acts as a flattened version of <code class="highlighter-rouge">Option[Tuple2[_, _]]</code> such that it only requires a single allocation to instantiate a successful result.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">sealed</span> <span class="k">abstract</span> <span class="k">class</span> <span class="nc">EndpointResult</span><span class="o">[</span><span class="kt">+A</span><span class="o">]</span>
<span class="nc">case</span> <span class="k">object</span> <span class="nc">Skipped</span> <span class="k">extends</span> <span class="nc">EndpointResult</span><span class="o">[</span><span class="kt">Nothing</span><span class="o">]</span>
<span class="k">case</span> <span class="k">object</span> <span class="nc">Matched</span><span class="o">[</span><span class="kt">A</span><span class="o">](</span><span class="n">rem</span><span class="k">:</span> <span class="kt">Input</span><span class="o">,</span> <span class="n">out</span><span class="k">:</span> <span class="kt">Future</span><span class="o">[</span><span class="kt">Output</span><span class="o">[</span><span class="kt">A</span><span class="o">]])</span> <span class="k">extends</span> <span class="nc">EndpoinResult</span><span class="o">[</span><span class="kt">A</span><span class="o">]</span></code></pre></figure>
<p>An important difference between those two approaches is in the signal their APIs are sending to the users. An API that engages composition isn’t always appropriate as a meaning for an internal abstraction living on a hot execution path. A bare-bones ADT exposing nothing but patten-matching (which costs almost nothing in Scala) as its API could be way more suitable for this kind of business.</p>
<p>As far as the benchmarking goes (see table below), saving around 56 bytes on a single <code class="highlighter-rouge">Endpoint.map</code> call make it up to 5% improvement in terms of throughput (see <a href="https://github.com/finagle/finch/pull/707">#707</a>).</p>
<figure class="highlight"><pre><code class="language-raw" data-lang="raw"> ---------------------------------------------------------------------------------------------
TA: Type Alias | ADT: sealed abstract class | Running Time Mode
---------------------------------------------------------------------------------------------
MapBenchmark.mapAsyncTA avgt 429.113 ± 43.297 ns/op
MapBenchmark.mapAsyncADT avgt 407.126 ± 12.807 ns/op
MapBenchmark.mapOutputAsyncTA avgt 821.786 ± 52.045 ns/op
MapBenchmark.mapOutputAsyncADT avgt 777.654 ± 26.444 ns/op
MapBenchmark.mapAsyncTA:·gc.alloc.rate.norm avgt 776.000 ± 0.001 B/op
MapBenchmark.mapAsyncADT:·gc.alloc.rate.norm avgt 720.000 ± 0.001 B/op
MapBenchmark.mapOutputAsyncTA:·gc.alloc.rate.norm avgt 1376.001 ± 0.001 B/op
MapBenchmark.mapOutputAsyncADT:·gc.alloc.rate.norm avgt 1320.001 ± 0.001 B/op</code></pre></figure>
<p>Introducing a new abstraction/type into a domain is always a trade-off and yet an easy one to make in this particular case. With a cost of a little of the maintenance burden, we get a less powerful and more performant abstraction that’s really hard to misuse.</p>
<h3 id="encodingdecoding-less">Encoding/Decoding Less</h3>
<p>Avoiding all sorts of allocations isn’t necessarily a local optimization, but can be applied globally to the scope of the entire program. Consider a typical HTTP application that serves JSON. Certainly, most of the allocations in such application are coming from JSON decoding and encoding such that instead of using the payload right away (in whatever form it is) we need to convert it into a JSON object first.</p>
<p>Presumably, JSON encoding and decoding aren’t something that could be easily avoided in the HTTP application exposing JSON APIs. This is a rightful workload for this kind of applications. However, there are certain stages (involving allocations) within the data transforming pipelines that might look mandatory and yet could be completely eliminated.</p>
<p>As far as the JSON decoding goes, there are at least two data transformation stages involved. After getting the bytes of the wire, we typically convert them into a JSON string (a UTF-8 string) instead of shoving them right away into a JSON parser. Whereas going from bytes to string (i.e., <code class="highlighter-rouge">new String(bytes)</code>) may seem pretty cheap, it actually involves quite a lot of allocations along with CPU time needed for memory copy. Instead of wrapping a given byte array with a <code class="highlighter-rouge">String</code>, JVM copies it over into a newly allocated <code class="highlighter-rouge">char</code> array of the same size thereby doubling the allocations (<code class="highlighter-rouge">char</code> takes 2 bytes on the JVM).</p>
<p>All of this sounds pretty frustrating given that, in most of the cases, the only reason we actually need a string (and not the bytes) is to satisfy the API of the used JSON library. Good news is lots of modern JSON libraries allow to skip this unnecessary to-string conversion and parse JSON objects directly from bytes (see below).</p>
<p><img src="/images/finch-performance-lessons/decoding.png" style="width: 800px;" /></p>
<p>As of Finch 0.11 (see <a href="https://github.com/finagle/finch/pull/671">#671</a>), for all the supported JSON libraries, the decoding of inbound payloads doesn’t involve any interim to-string conversions and is done in terms of bytes. In our end-to-end benchmark, this optimization alone accounts for 13% improvements in the throughput.</p>
<p>When it comes to micro-benchmarking, decoding from bytes instead of a string cuts both allocations and running time in half (see below) making it a pretty great deal given how small and simple the change is.</p>
<figure class="highlight"><pre><code class="language-raw" data-lang="raw"> ---------------------------------------------------------------------------------------------
S: parse string | BA: parse byte array | Running Time Mode
---------------------------------------------------------------------------------------------
JsonBenchmark.decodeS avgt 5950.402 ± 464.246 ns/op
JsonBenchmark.decodeBA avgt 3232.696 ± 171.160 ns/op
JsonBenchmark.decodeS:·gc.alloc.rate.norm avgt 7992.005 ± 12.749 B/op
JsonBenchmark.decodeBA:·gc.alloc.rate.norm avgt 4908.003 ± 6.374 B/op</code></pre></figure>
<p>A similar optimization is also possible on the outbound path. It might be worth trying to skip the unnecessary string representation and print directly into a byte array (see below). By analogy from converting a byte array into a string, converting a string into a byte array also involves a surprising amount of allocations. Because it’s not known beforehand how many bytes a given string is going to occupy, JVM is trying to guestimate that as <code class="highlighter-rouge">3 * string.length</code>, where 3 is <a href="https://docs.oracle.com/javase/7/docs/api/java/nio/charset/CharsetEncoder.html#maxBytesPerChar()">the maximum numbers of bytes needed for a single UTF-8 character</a>.</p>
<p><img src="/images/finch-performance-lessons/encoding.png" style="width: 800px;" /></p>
<p>Printing JSON directly into a byte array isn’t so common as parsing bytes and only a couple of JSON libraries support that. As of Circe 0.7 (see <a href="https://github.com/circe/circe/pull/537">#537</a>) and Finch 0.12 (see <a href="https://github.com/finagle/finch/pull/717">#717</a>) it is a default JSON encoding mode for applications depending on <code class="highlighter-rouge">finch-circe</code> (including those using <a href="https://github.com/circe/circe-jackson">circe-jackson</a> for printing, see <a href="https://github.com/circe/circe-jackson/pull/11">#11</a>).</p>
<p>The encoding benchmark we run for JSON reports about 30% drop in both allocations and running time when targeting byte arrays instead of strings (see below).</p>
<figure class="highlight"><pre><code class="language-raw" data-lang="raw"> ---------------------------------------------------------------------------------------------
S: print string | BA: print byte array | Running Time Mode
---------------------------------------------------------------------------------------------
JsonBenchmark.encodeS avgt 16400.327 ± 621.935 ns/op
JsonBenchmark.encodeBA avgt 12645.070 ± 391.591 ns/op
JsonBenchmark.encodeS:·gc.alloc.rate.norm avgt 46900.015 ± 19.123 B/op
JsonBenchmark.encodeBA:·gc.alloc.rate.norm avgt 30360.011 ± 0.001 B/op</code></pre></figure>
<p>The main point here is not that string to bytes and bytes to string conversions aren’t really cheap on JVM (they are as cheap as they could be) but rather aren’t always necessary. The tricky part is that it often requires us to look at the problem end-to-end to figure what data transformations (as well as interim results) don’t add much value to the domain and can be eliminated.</p>
<h3 id="takeaways">Takeaways</h3>
<p>“This is slow” is one of the toughest problems to debug. Although, always paying attention to the allocation profile is quite a healthy habit allowing to reduce the number of performance-related problems to their minimum. Despite all the great tools available for chasing down allocations (think of JMH’s <code class="highlighter-rouge">-prof gc</code>), none of them are going to tell us if there are any <a href="https://twitter.com/giltene/status/818258334327382017">shortcuts our application can take</a> to get the final result faster.</p>Vladimir KostyukovWith respect to a good tradition, I’m publishing a blog post based on my ScalaMatsuri talk covering a few performance lessons I learned working on Finch. Even though these lessons are coming from a foundation library, I tend to believe they can be projected into any Scala (or even Java) codebase no matter if it’s an HTTP library or a user-facing application.Finagle 1012016-05-12T09:00:00+00:002016-05-12T09:00:00+00:00http://kostyukov.net/posts/finagle-101<p>This post is based on my talk <a href="http://event.scaladays.org/scaladays-nyc-2016#!#schedulePopupExtras-7565">“Finagle: Under the Hood”</a> that <a href="https://twitter.com/vkostyukov/status/730375788646772736">was presented</a>
at Scala Days NYC. I thought I’d publish this for those who prefer reading instead of watching (video
is not yet published anyway). The full <a href="http://vkostyukov.net/slides/finagle-101/">slide deck is available online</a> as well.</p>
<h3 id="what-finagle-is">What Finagle is?</h3>
<p><img src="/images/finagle-101/finagle-logo.png" style="width: 400px;" /></p>
<p>Finagle is an RPC system for JVM developed and used in production at Twitter. It’s written in Scala
but has a Java-compatible API for most of its components.</p>
<p>When it comes to describing what Finagle can do, I really like Alexey’s tweet from the last FinagleCon.</p>
<center>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">The key problem with <a href="https://twitter.com/hashtag/Finagle?src=hash">#Finagle</a> adoption that it solves tons of problems that you know nothing about until it's too late <a href="https://twitter.com/hashtag/FinagleCon?src=hash">#FinagleCon</a></p>— Alexey Kachayev (@kachayev) <a href="https://twitter.com/kachayev/status/631928029258772480">August 13, 2015</a></blockquote>
<script async="" src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
</center>
<p>The most important part here is: “Finagle solves tons of problems”. And it absolutely does.
There are many different things (a user doesn’t know about) happening underneath a Finagle client
or a server to make sure sessions are reliable enough. Finagle is doing a great job at tolerating all
kinds of session and transport failures so its users usually don’t even notice neither failures nor
actions Finagle is taking to tolerate them.</p>
<p>Finagle’s been around for quite a long time - <a href="https://twitter.com/finagle/status/567885703180259328">since 2010</a>. With 4.5k stars, it’s the 7th
most popular Scala project on Github. As for today more than 15 protocols are implemented,
including our own multiplexing protocol <a href="http://twitter.github.io/finagle/guide/Protocols.html#mux">Mux</a>. Mux is a full-duplex, multiplexing protocol that
might be roughly viewed as a subset of HTTP/2 so it has low-level control messages for pings,
interrupts, and many more. We will see later how we utilize those signals within Finagle.</p>
<p>Finagle has quite a specific mission to make RPC sessions fast, resilient, and easy to setup. Operating
on Layer 5 of the OSI model, Finagle knows almost nothing about the application and even protocol
it’s used by. That’s why it doesn’t actually answer lots of application-specific questions like “How
to do logging?”, “How to do JSON?”, “Where to define a REST controller?”. To fulfill that gap and
utilize Finagle’s outstanding scalability and performance, people started building opinionated libraries
and frameworks that are supposed to answer all those question. Just to mention a couple of such libraries
specifically designed for HTTP: <a href="https://twitter.github.io/finatra/">Finatra</a> and <a href="https://github.com/finagle/finch">Finch</a>.</p>
<p>The <a href="https://twitter.com/vkostyukov/status/613840446725357569">Finagle team at Twitter</a> is called CSL (Core System Libraries). There are ~10 of us maintaining
libraries that power Twitter’s distributed infrastructure. We own <a href="http://twitter.github.io/finagle/">Finagle</a>, <a href="https://github.com/twitter/util">Util</a>,
<a href="https://github.com/twitter/scrooge">Scrooge</a>, <a href="https://twitter.github.io/twitter-server/">TwitterServer</a> and <a href="https://twitter.github.io/finatra/">Finatra</a>. We’ve got on-call rotation and
internal <a href="https://groups.google.com/forum/#!forum/finaglers">finaglers</a> we use to provide support for teams dealing with production issues
related to Finagle.</p>
<p>Internally Finagle lives in a monorepo and services depend on its source code. Essentially, every time
we do an OSS release, we actually roll out the code that has been already tested internally for months
(by thousands of services serving millions or RPCs every second), which sounds like a pretty decent
deal for external adopters.</p>
<h3 id="the-big-picture">The “Big Picture”</h3>
<p>Finagle is designed with this simple idea in mind that <a href="http://monkey.org/~marius/funsrv.pdf">your server is a function</a>. This means
you can talk to that server by calling this function and you can implement that server by defining this
function.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">trait</span> <span class="nc">Service</span><span class="o">[</span><span class="kt">Req</span>, <span class="kt">Rep</span><span class="o">]</span> <span class="nc">extends</span> <span class="o">(</span><span class="nc">Req</span> <span class="k">=></span> <span class="nc">Future</span><span class="o">[</span><span class="kt">Rep</span><span class="o">])</span></code></pre></figure>
<p>The most exciting part here is those type params <code class="highlighter-rouge">Req</code> and <code class="highlighter-rouge">Rep</code> meaning that your service (server or
client) is actually abstracted over the particular protocol (particular request/response types).
This gives us a freedom to build most of the generic features like retries in the protocol-agnostic
way, not mentioning that types add some safety into your code. For example, it’s impossible to send
an HTTP request to a MySQL - compiler will catch that.</p>
<p>This is how Finagle is organized internally. There are three major components it’s built on:
Finagle <strong>stack</strong> on a left, <a href="http://netty.io">Netty</a> <strong>pipeline</strong> on a right, and <strong>transport</strong> in between.</p>
<p><img src="/images/finagle-101/finagle-102.png" style="height: 400px;" /></p>
<p>This is a really interesting combination: we’ve got two completely different worlds here. The
service-oriented, type safe, powered by composition and functional abstractions Finagle world on the
left. And event-based, untyped, low-level Netty world on the right. And transport glues them together.</p>
<p>We will cover the left part here in this post. The Finagle stack is our generic abstraction used to
<em>materialize</em> Finagle clients and servers out of a composition of ordered modules, which may be
anything that is known how to compose. And we know how to compose services (since they are just
functions). Technically we can put those services/functions into a stack and materialize it into a
client or server. That’s pretty much what we do in Finagle today. If you speak functional
programming, we <em>fold</em> a collection of modules into a client or server.</p>
<p>Each module in the stack represents some standalone functionality like retries, load balancing,
circuit breaking and so on. And in fact, this is exactly how Finagle server looks like - it’s simple
and flat. Clients are quite tricky on the other hand.</p>
<p><img src="/images/finagle-101/finagle-stacks1.png" style="height: 400px;" /></p>
<p>There is a actually a <strong>tree of stacks</strong> in the client with two branching points. First, a load balancer
distributes traffic across a number of nodes or endpoints. Second, a connection pool, maintains a pool
of connection stacks, which terminate with transport and a Netty pipeline.</p>
<h3 id="configuration">Configuration</h3>
<p>Before we go deeper into details about client/server modules, let’s discuss what they have in common -
<a href="https://finagle.github.io/blog/2016/02/05/release-notes-6-33">a configuration API</a>. To be fair, that’s one of my favorite topics because I think
it’s a really tricky problem to build a configuration API that’s both easy-to-use (common things should
be easy to do) and powerful-enough (uncommon things should be possible to do). The current version of
configuration API in Finagle is the third iteration of an idea that <em>configuration is a code</em>.</p>
<p>Configuration is always code (not CLI flags, not config files) in Finagle so it type-checks by your
compiler and auto-completes by your IDE. There is a convention on how to find an entry point API
depending on the protocol you want to work with. Usually, you start with typing something like
<code class="highlighter-rouge">Http.client.with</code> and see what’s possible to configure/override on a given client. We separate
commonly-configured params from the ones we think it might be dangerous to tweak today. We call those
expert-level and use slightly different API to override them. The expert-level API is
usually not so friendly and discoverable as <code class="highlighter-rouge">with</code>-API, which works perfectly as a red flag:
if you’re not having a good time writing configuration, you’re probably doing something wrong or
dangerous.</p>
<h3 id="servers">Servers</h3>
<p>Servers are quite simple in Finagle. They are optimized for high-throughput by doing as little as possible
on top of just handling requests. At the minimum, a Finagle server does tracing and metrics, maintains
request concurrency level and enforces a very simple request handletime timeout.</p>
<p>Here is an example of how you can configure the <strong>concurrency limit</strong> on your server. You can say how
many concurrent requests your server can handle at once and how many waiters are allowed. Everything on
top of that will be <em>rejected</em> by a server and hopefully retried by a client talking to our server.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">com.twitter.finagle.Http</span>
<span class="k">val</span> <span class="n">server</span> <span class="k">=</span> <span class="nc">Http</span><span class="o">.</span><span class="n">server</span>
<span class="o">.</span><span class="n">withAdmissionControl</span><span class="o">.</span><span class="n">concurrencyLimit</span><span class="o">(</span>
<span class="n">maxConcurrentRequests</span> <span class="k">=</span> <span class="mi">10</span><span class="o">,</span>
<span class="n">maxWaiters</span> <span class="k">=</span> <span class="mi">0</span>
<span class="o">)</span></code></pre></figure>
<p>Concurrency limit is one of the forms of <em>admission control</em> we have for servers. Admission control
is a technique that employs some kind of feedback controller from the underlying system to determine
whether it’s reasonable to handle (for servers) or send (for clients) a given request or it’s better to
reject it. As a canonical example of server-side admission control, we might think of something that
prevents a server from overwhelming by rejecting some amount of requests. Essentially, instead of
slowing down 100% of requests we reject, for example, 25% but keep operating normally (and maybe even
stay within the SLOs).</p>
<p><strong>Request timeout</strong> is symmetric and might be configured on both servers and clients. It literally
means the same thing: timeout requests for which responses weren’t sent (when configured on a server)
or received (when configured on a client) in a given amount of time.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">com.twitter.conversions.time._</span>
<span class="k">import</span> <span class="nn">com.twitter.finagle.Http</span>
<span class="k">val</span> <span class="n">server</span> <span class="k">=</span> <span class="nc">Http</span><span class="o">.</span><span class="n">server</span>
<span class="o">.</span><span class="n">withRequestTimeout</span><span class="o">(</span><span class="mf">42.</span><span class="n">seconds</span><span class="o">)</span></code></pre></figure>
<p>There is no default value for request timeout and it’s disabled for the same reason concurrency limit
is disabled. Finagle is trying really hard to not speculate on any application-specific (or even
protocol-specific) params. You have to be explicit about those.</p>
<h3 id="clients">Clients</h3>
<p>Clients are where things get interesting. Unlike servers, which are optimized for high throughput,
clients maximize success rate and minimize latency by doing as much as possible to make sure
a request will succeed in the least possible time. This makes them much more complicated than servers.
The list of features clients do is quite dramatic for being fully covered here in this post, but we
can surely walk through them and see what kind of problems they solve.</p>
<p>First of all, we need to be able to <strong>retry</strong> failed requests thereby maximizing success rate. Retrying
implies a number of quite difficult questions we need to find answers to. How can we say if the request
is failed? Is that safe to retry this request? If we already tried once and it didn’t help, should
we keep retrying or should we give up? Finagle is taking a good care about all of these and we’ll see
later what kind of abstractions it uses to achieve that.</p>
<p>Next, we need a way to help services locate each other so there is a built-in <strong>service discovery</strong>
support in every Finagle client. By default, it might either use DNS or <a href="https://zookeeper.apache.org/">Zookeeper</a>, but it’s also
possible to plug in any other library by implementing a couple of simple interfaces. For example,
there is an OSS package that enables <a href="https://github.com/kachayev/finagle-consul">Consul support in Finagle</a>.</p>
<p>We also need a tooling around <strong>timeouts</strong> so we could put reasonable bounds on components in our
distributed system. In addition to request timeout that we’ve already discussed, there is the hell of a
ground of timeouts you can override in Finagle, starting with low-level TCP connect timeouts and
finishing with session timeouts. None of the timeouts are bound by default since those values considered
specific to a given application. Use the following example to override timeouts.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">com.twitter.conversions.time._</span>
<span class="k">import</span> <span class="nn">com.twitter.finagle.Http</span>
<span class="k">val</span> <span class="n">client</span> <span class="k">=</span> <span class="nc">Http</span><span class="o">.</span><span class="n">client</span>
<span class="o">.</span><span class="n">withTransport</span><span class="o">.</span><span class="n">connectTimeout</span><span class="o">(</span><span class="mf">1.</span><span class="n">second</span><span class="o">)</span> <span class="c1">// TCP connect
</span> <span class="o">.</span><span class="n">withSession</span><span class="o">.</span><span class="n">acquisitionTimeout</span><span class="o">(</span><span class="mf">42.</span><span class="n">seconds</span><span class="o">)</span>
<span class="o">.</span><span class="n">withSession</span><span class="o">.</span><span class="n">maxLifeTime</span><span class="o">(</span><span class="mf">20.</span><span class="n">seconds</span><span class="o">)</span> <span class="c1">// connection max life time
</span> <span class="o">.</span><span class="n">withSession</span><span class="o">.</span><span class="n">maxIdleTime</span><span class="o">(</span><span class="mf">10.</span><span class="n">seconds</span><span class="o">)</span> <span class="o">//</span> <span class="n">connection</span> <span class="n">max</span> <span class="n">idle</span> <span class="n">time</span></code></pre></figure>
<p>Of course, we need to distribute traffic across a number of instances where our software is deployed.
Finagle comes with a very rich set of <strong>load balancers</strong> and I honestly can’t name another system around
that provides such advanced load balancing strategies as Finagle does today. We’ll cover load balancing
in details later in this post since it plays a major role in resiliency of Finagle clients.</p>
<p>Besides picking a right replica to send request to, a Finagle client also takes care about managing
<strong>connections pool</strong> as well as maintaining a stack of <strong>circuit breakers</strong> used to exclude unreliable
replicas/sessions from a request path.</p>
<p>Clients are also come with <strong>interrupts</strong> support that is primarily used for <strong>request cancellation</strong>
and prevents both servers and clients from doing useless work. You might think - “Why would I cancel
the request? I sent it and I meant it!”. The tricky part here is that interrupts happen implicitly.
Consider the following example. Your client sets a timeout and sends a request to a server. After a given
amount of time a timeout expires and a client interrupts the future associated with a given request.
What happens next is really depending on client’s protocol, but long story short, Finagle will do its
best in order to propagate that cancellation across service boundaries. Worst case scenario -
(i.e., HTTP 1.1) it cuts the connection but there is so much more we can do in Mux (and perhaps HTTP/2)
by sending a control message to a server saying something close to “Hey, I’m no longer interested in
the result of this request so you should feel free to drop it”.</p>
<p>Quite similarly to propagating interrupts, we might want to also do that for some <strong>request context</strong>
that might contain quite useful (for debugging and monitoring/tracing) information like request id,
request deadline, upstream/downstream service name and so on. Clients support that today and will
take care about serializing/deserializing contexts depending on the used protocol (eg: request headers
are used in HTTP) and propagating them across service boundaries.</p>
<p>Both interrupts and contexts are used heavily at Twitter and that’s one of the reasons why we still
like our futures better. Unlike Scala futures, Twitter futures will propagate contexts and interrupts
through the chain of transformations, no matter if its parts are executed by different threads.</p>
<h4 id="response-classification">Response Classification</h4>
<p>As we discussed before, being on Layer 5 means knowing everything about transport and sessions, but
nothing (or almost nothing) about protocol/application. This implies some unexpected behaviour
when HTTP 500 (Service Unavailable) actually looks like a successful response to a Finagle client.
The following poll proves that this is confusing at the minimum.</p>
<center>
<blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr">Quick poll by <a href="https://twitter.com/kevino">@kevino</a>: Does Finagle treat HTTP 500 response as a failure or success?</p>— Finagle (@finagle) <a href="https://twitter.com/finagle/status/697184041267662849">February 9, 2016</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
</center>
<p>Why does that happen? Nothing magical. There is definitely nothing wrong with a correctly structured
protocol message (eg: HTTP 500) from client’s point of view so it’s counted as a success. And that’s
quite a big deal. First, we’ve got our metrics messed up: success rate is 100%, but we’re serving
failures. Second, our load balancers go crazy. They think if a given server responds fast and the
response is a success, it seems like a great deal to send it more traffic while it fact it was failing
fast.</p>
<p>Since 6.33, we’ve got <a href="https://finagle.github.io/blog/2016/02/09/response-classification">response classifiers</a> that you can plug into any client and teach
it how to treat responses. For example, here is how you can tell HTTP client see HTTP 503 as a
non-retriable failure so the circuit breaker will kick in on this response.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">com.twitter.finagle.</span><span class="o">{</span><span class="nc">Http</span><span class="o">,</span> <span class="n">http</span><span class="o">}</span>
<span class="k">import</span> <span class="nn">com.twitter.finagle.service._</span>
<span class="k">import</span> <span class="nn">com.twitter.util._</span>
<span class="k">val</span> <span class="n">classifier</span><span class="k">:</span> <span class="kt">ResponseClassifier</span> <span class="o">=</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">ReqRep</span><span class="o">(</span><span class="k">_</span><span class="o">,</span> <span class="nc">Return</span><span class="o">(</span><span class="n">r</span><span class="k">:</span> <span class="kt">http.Response</span><span class="o">))</span> <span class="k">if</span> <span class="n">r</span><span class="o">.</span><span class="n">statusCode</span> <span class="o">==</span> <span class="mi">503</span> <span class="k">=></span>
<span class="nc">ResponseClass</span><span class="o">.</span><span class="nc">NonRetryableFailure</span>
<span class="o">}</span>
<span class="k">val</span> <span class="n">client</span> <span class="k">=</span> <span class="nc">Http</span><span class="o">.</span><span class="n">client</span>
<span class="o">.</span><span class="n">withResponseClassifier</span><span class="o">(</span><span class="n">classifier</span><span class="o">)</span></code></pre></figure>
<h4 id="retries">Retries</h4>
<p>The retries module is placed at the very top of the stack so it can retry failures from the underlying
modules (eg: circuit breakers, timeouts, load balancers). Finagle will only retry when it’s absolutely
safe to do that. For example, when it’s known that a request wasn’t written to a wire yet, or when a
request was rejected by a server. And this makes a lot of sense: if a load balancer picked a replica
that rejected a request (eg: due to admission control) - it’s totally fine to retry that on a
different replica.</p>
<p>Retries are built on top of <a href="https://finagle.github.io/blog/2016/02/08/retry-budgets/">retry budgets</a>, which behave as leaky token buckets and tie a number
of retries to a total number of requests. Technically, <code class="highlighter-rouge">RetryBudget</code> is responsible for limiting a number
of retries and helps mitigating <strong>retry storms</strong> (i.e. retrying too much).</p>
<p>Once we’re given a permission for a retry by a retry budget, we’d need to figure out how long to wait
(if wait at all) between retries. This technique is called <strong>backoff</strong> in Finagle (and almost any other
library) and represented as <code class="highlighter-rouge">Stream[Duration]</code>, which means you can easily plug in your own thing.</p>
<p>Finagle provides an API for building popular backoffs, including <a href="http://www.awsarchitectureblog.com/2015/03/backoff.html">jittered</a>, which are super
useful in clusters built around optimistic locking whose individual nodes might perform poorly under
the high contention. Our goal is to make sure that clients started at the same time are not competing
with each other on retries to a single server. To do that we add some randomized factor (or <strong>jitter</strong>)
into every duration from a backoff policy thereby reducing chances that several clients will be retrying
simultaneously.</p>
<p>For example, that’s how we override retry budget and retry backoff on HTTP client. Budget allows 10%
of total request to be requeued on top of 5 requests per second to accommodate clients with low RPS.
Backoff uses jittered version of randomized function that grows exponentially
(eg: <code class="highlighter-rouge">random(2s) :: random(4s) :: ... :: random(32s)</code>).</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">com.twitter.finagle.Http</span>
<span class="k">import</span> <span class="nn">com.twitter.conversions.time._</span>
<span class="k">import</span> <span class="nn">com.twitter.finagle.service.</span><span class="o">{</span><span class="nc">RetryBudget</span><span class="o">,</span> <span class="nc">RetryBackoff</span><span class="o">}</span>
<span class="k">val</span> <span class="n">budget</span> <span class="k">=</span> <span class="nc">RetryBudget</span><span class="o">(</span>
<span class="n">ttl</span> <span class="k">=</span> <span class="mf">10.</span><span class="n">seconds</span><span class="o">,</span>
<span class="n">minRetriesPerSec</span> <span class="k">=</span> <span class="mi">5</span><span class="o">,</span>
<span class="n">percentCanRetry</span> <span class="k">=</span> <span class="mf">0.1</span>
<span class="o">)</span>
<span class="k">val</span> <span class="n">client</span> <span class="k">=</span> <span class="nc">Http</span><span class="o">.</span><span class="n">client</span>
<span class="o">.</span><span class="n">withRetryBudget</span><span class="o">(</span><span class="n">budget</span><span class="o">)</span>
<span class="o">.</span><span class="n">withRetryBackoff</span><span class="o">(</span><span class="nc">Backoff</span><span class="o">.</span><span class="n">exponentialJittered</span><span class="o">(</span><span class="mf">2.</span><span class="n">seconds</span><span class="o">,</span> <span class="mf">32.</span><span class="n">seconds</span><span class="o">))</span></code></pre></figure>
<h4 id="load-balancers">Load Balancers</h4>
<p>There are pretty deep-seated assumptions inside of Finagle that service clusters are homogeneous,
that they are equivalent from the point of view of the application. Those are often referenced as a
<strong>replica set</strong>.</p>
<p>In Finagle, load balancers consist of two independent components: <strong>load distributor</strong> and
<strong>load metric</strong>. That said, the load balancing algorithm might be described in terms of those two: we
distribute load across some subset of nodes/replicas and pick the one for which a load metric is
minimal.</p>
<p>In order to better understand the variety of load balancing options available in Finagle today, let’s
have a look at their evolution so we can see why they were introduced in the first place and what
problems do they solve.</p>
<p>In the very beginning, we had this <strong>heap</strong> balancer built on top of min-heap that maintains the number
of outstanding requests per each node. That worked really well and it was the default choice for a
long time.</p>
<p><img src="/images/finagle-101/heap-lb1.png" style="height: 300px;" /></p>
<p>But at some point, we figured out a number of drawbacks this option had. First of all, a load balancer
state (i.e., heap) is a highly contended resource (updated on each request) and it should support
extremely fast updates. Needless to say, that heap is an amazing data structure and has constant time
access to its min element, but every other operation takes the logarithmic time to perform. That’s why it’s
tricky to implement a different and perhaps more sophisticated load metric on top of the heap without
sacrificing its performance.</p>
<p>The next step was the <strong>P2C</strong> (power of two choices) load balancer that solved most of the heap balancer
problems. By employing quite a brilliant idea, the algorithm takes two random nodes from the server
set and picks the least loaded one. If we use that strategy repeatedly, we will get a manageable
upper bound of the maximum load on each node.</p>
<p><img src="/images/finagle-101/p2c1.png" style="height: 200px;" /></p>
<p>Given that we can now update the load balancer state in a constant time (by just updating the array),
we can employ more sophisticated load metrics. The EWMA load metrics was Finagle’s next attempt in that
direction.</p>
<p>Per each node in the server set, <strong>EWMA</strong> (stands for Exponentially Weighted Moving Average) keeps track
of round-trip latency weighted by a number of outstanding requests. And this is really smart, because
being on Layer 5, we can take an advantage of both RPC latency and RPC queue depth. This makes EWMA
quite sensitive to latency spikes so it reacts much faster on GC pauses and JVM warmups. For example,
if a load balancer happened to pick a replica that just went to a long GC pause, its EWMA metric will
reflect that immediately and its load will be adjusted accordingly.</p>
<p>There is an <a href="https://blog.buoyant.io/2016/03/16/beyond-round-robin-load-balancing-for-latency/">outstanding post</a> from <a href="https://twitter.com/stevej">@stevej</a> on the comparison of three different load balancing
options in Finagle that quite explicitly shows how EMWA outperforms all other options. EWMA shows
the best result there in mitigating latency spikes caused by GC pauses or JVM warmups.</p>
<p>While EMWA looks very promising already, there is even more advanced load balancer in Finagle today.
The <strong>aperture</strong> load balancer is designed to solve the problem of large server sets. Depending on a
scale, each Finagle client might be talking to several thousands of servers, which will likely
result in several thousands of opened connections and quite low concurrency per each node. Why does
a number of connections matter? It’s waste of resources and it comes with a cost of long tail latency
because of the high number of connection establishments. Why does concurrency matter? To take advantage
of any load metrics we need some numbers to work with. When concurrency is low our least loaded metrics
is zero for every server.</p>
<p>The aperture load balancer solves that by viewing the huge server set through a small window where
it can apply any existing load balancer.</p>
<p><img src="/images/finagle-101/aperture1.png" style="height: 300px;" /></p>
<p>The advantages of aperture are quite promising. Fewer connections for clients and servers means better
tail latency. Also, by employing a simple feedback controller, the load balancer adjusts the aperture
size to maintain the requested concurrency level thereby keeping replicas in a warm state.</p>
<p>We will likely make aperture a default balancing option quite soon, but right now you’d need to enable
it manually. Here we build the aperture load balancer with initial size 10 and load bounds between
1 and 2. Basically, this means that we want to make sure all replicas in aperture will be constantly
getting between 1 and 2 concurrent requests and the aperture will be resized dynamically to satisfy
that requirement.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">com.twitter.conversions.time._</span>
<span class="k">import</span> <span class="nn">com.twitter.finagle.Http</span>
<span class="k">import</span> <span class="nn">com.twitter.finagle.loadbalancer.Balancers</span>
<span class="k">val</span> <span class="n">balancer</span> <span class="k">=</span> <span class="nc">Balancers</span><span class="o">.</span><span class="n">aperture</span><span class="o">(</span>
<span class="n">lowLoad</span> <span class="k">=</span> <span class="mf">1.0</span><span class="o">,</span> <span class="n">highLoad</span> <span class="k">=</span> <span class="mf">2.0</span><span class="o">,</span> <span class="c1">// the load band adjusting an aperture
</span> <span class="n">minAperture</span> <span class="k">=</span> <span class="mi">10</span> <span class="c1">// min aperture size
</span><span class="o">)</span>
<span class="k">val</span> <span class="n">client</span> <span class="k">=</span> <span class="nc">Http</span><span class="o">.</span><span class="n">client</span><span class="o">.</span><span class="n">withLoadBalancer</span><span class="o">(</span><span class="n">balancer</span><span class="o">)</span></code></pre></figure>
<h4 id="circuit-breakers">Circuit Breakers</h4>
<p>Now we know how load balancers distribute load, the question is how they avoid nodes that are likely
to fail or already failed? This is done by a layer of circuit breakers placed under load balancers so
that when they mark a replica unavailable, it will be avoided by a load balancer.</p>
<p>As of today, there are three circuit breakers in Finagle.</p>
<ul>
<li><strong>Fail Fast</strong> - prematurely disables the session that failed TCP connect.</li>
<li><strong>Failure Accrual</strong> - performs liveness detection on a request basis.</li>
<li><strong>Threshold Failure Detector</strong> - a ping-based failure detector that periodically measures RTT latency
for ping-pong exchange between nodes and if the latency doesn’t look good, it excludes that session from a
request path. This is a pretty powerful tool, but it requires the underlying protocol to support liveness
detection control signals. For now, we have that implemented for Mux and will likely do that for HTTP/2
as well.</li>
</ul>
<p>Failure Accrual is our the most advanced circuit breaker in the way that it supports a pluggable policy
used to determine when to mark a session unavailable. Today it’s possible to either configure it to
maintain a required success rate (if that goes below a requested value, a session is marked dead) or
you can say after how many consecutive failures a session is considered unavailable.</p>
<p>By default, it’ll mark a session dead after 5 failures in row and go to a jittered backoff before
re-enabling that session/or replica again. Although, it’s quite easy to override that to be success
rate-based and disable the session once its success rate is below 95% on the most recent hundred of
requests.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">com.twitter.conversions.time._</span>
<span class="k">import</span> <span class="nn">com.twitter.finagle.Http</span>
<span class="k">import</span> <span class="nn">com.twitter.finagle.service.</span><span class="o">{</span><span class="nc">Backoff</span><span class="o">,</span> <span class="nc">FailureAccrualFactory</span><span class="o">.</span><span class="nc">Param</span><span class="o">}</span>
<span class="k">import</span> <span class="nn">com.twitter.finagle.service.exp.FailureAccrualPolicy</span>
<span class="k">val</span> <span class="n">twitter</span> <span class="k">=</span> <span class="nc">Http</span><span class="o">.</span><span class="n">client</span>
<span class="o">.</span><span class="n">configured</span><span class="o">(</span><span class="nc">Param</span><span class="o">(()</span> <span class="k">=></span> <span class="nc">FailureAccrualPolicy</span><span class="o">.</span><span class="n">successRate</span><span class="o">(</span>
<span class="n">requiredSuccessRate</span> <span class="k">=</span> <span class="mf">0.95</span><span class="o">,</span>
<span class="n">window</span> <span class="k">=</span> <span class="mi">100</span><span class="o">,</span>
<span class="n">markDeadFor</span> <span class="k">=</span> <span class="nc">Backoff</span><span class="o">.</span><span class="n">const</span><span class="o">(</span><span class="mf">10.</span><span class="n">seconds</span><span class="o">)</span>
<span class="o">)))</span></code></pre></figure>
<h3 id="whats-next">What’s next?</h3>
<p>There are quite exciting times ahead. Netty 4 migration is on track and should happen really soon. I
know we’ve been promising this Netty 4 utopia for about two years now, but we’re finally getting there.</p>
<p>What Netty 4 means for Finagle? First, we expect better performance/fewer allocations. Second, support
for new protocols (like HTTP/2). Third, simplification of Finagle internals due to simpler and safer
threading model in Netty 4. Finally, it’s better to stay aligned with the state of the art IO library for
JVM to get the maximum of it.</p>
<p>We’ll also continue working on resiliency and admission control in Finagle to make sure we’re doing as
much as possible to make your RPC sessions even more reliable and easy to configure.</p>Vladimir KostyukovThis post is based on my talk “Finagle: Under the Hood” that was presented at Scala Days NYC. I thought I’d publish this for those who prefer reading instead of watching (video is not yet published anyway). The full slide deck is available online as well.How Fast is Finch?2016-02-29T09:00:00+00:002016-02-29T09:00:00+00:00http://kostyukov.net/posts/how-fast-is-finch<p>Turns out I’ve never mentioned anything about <a href="https://github.com/finagle/finch">Finch</a>, a library I’m working on most of my
free time, in my personal blog. So I decided to finally fix that and write a small note on what I
think about Finch’s performance as an HTTP library/server. To be more precise, I want to comment
<a href="https://www.techempower.com/benchmarks/#section=data-r12&hw=peak&test=json&l=6bk">the most recent results of the TechEmpower benchmark</a> and perhaps, give some
insights on why Finch is ranked so high there.</p>
<p>A couple of days ago, results from the most recent run of the TechEmpower benchmark were published.
While I was expecting Finch to perform well there, I didn’t expect it to be the second fastest HTTP
library written in Scala.</p>
<blockquote align="center" class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Impressive results by <a href="https://twitter.com/hashtag/Finch?src=hash">#Finch</a> (now <a href="https://twitter.com/hashtag/Scala?src=hash">#Scala</a> 2nd fastest HTTP library) running <a href="https://twitter.com/techempower">@techempower</a> benchmark (430k QPS peak): <a href="https://t.co/YeBMnJeQ5W">pic.twitter.com/YeBMnJeQ5W</a></p>— Vladimir Kostyukov (@vkostyukov) <a href="https://twitter.com/vkostyukov/status/703374308056309760">February 27, 2016</a></blockquote>
<script async="" src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>With that said, I’ll go ahead and answer my own question in the title of this post, “How Fast is
Finch?”. Looking at the chart, it’s obvious that <strong>Finch is fast enough</strong> to perform
really well on 99.99% of your business problems. At least, in comparison with other Scala libraries.</p>
<p>The most interesting part of this discussion is trying to understand why it performs so well? Why
that insane level of indirections, Finch involves on top of Finagle services, doesn’t add much
overhead? The quick answer would be: Finch owns most of its high performance to the <em>fast</em> and
<em>battle-proven</em> libraries it depends on (Finagle, Circe, Cats and Shapeless). The secret recipe is
quite simple - take <strong>fast components</strong> (no matter functional or imperative) and glue
them together using the <strong>rock-solid (pure) functional abstractions</strong> that are easy to
test and reason about.</p>
<p>Thus, Finch is fast because …</p>
<h4 id="finagle-is-fast">Finagle is fast</h4>
<p>Finch was designed with one goal in mind: provide an easy to use API (i.e., combinators API) on top
one that’s easy to implement (Finagle services). Obviously, it should involve some overhead on top of
bare metal Finagle. And it does: by our latest measurements it adds 10% of allocations and 5% of
running time on top of Finagle, which is not so dramatic and, I’d say, pretty good for a pre 1.0
library.</p>
<p>When it comes to <a href="https://github.com/twitter/finagle">Finagle</a>, there is no doubt in its performance. The Finagle team
at Twitter (which I’m luckily a part of) puts a lot of effort to make sure that Finagle’s
performance is constantly improving. There is a number of micro-benchmarks we run on each commit to
critical components. There are integration tests we write using the internal framework called Integ
to load test different Finagle topologies. Finally, there is <a href="https://twitter.com">https://twitter.com</a> that stress
tests Finagle 24/7 doing millions of queries per second (DC-wise).</p>
<p>Finagle itself is built by the same principles Finch is. It reuses the industry’s best practices and
runs its IO layer on <a href="http://netty.io/">Netty</a>, which is well-know as the best thing that happened to JVM in
years. <a href="http://netty.io/wiki/adopters.html">Netty is everywhere</a>: I’d be surprised if Netty code doesn’t handle at least 10%
of your everyday traffic. You send at tweet - Finagle and Netty take care of it. You upload your
photos to iCloud, talk to Siri - <a href="https://speakerdeck.com/normanmaurer/connectivity">Netty handles it</a>. This list is almost limitless,
I’m not sure I know any JVM shop around that doesn’t use Netty in some way.</p>
<h4 id="circe-is-fast">Circe is fast</h4>
<p>While Finch is designed to be agnostic to a concrete serialization library, there is one that plays
really nicely with Finch. <a href="https://github.com/travisbrown/circe">Circe</a> is relatively new JSON library, started as fork of
<a href="http://argonaut.io/">Argonaut</a>, but ended up as a completely standalone and mature project. It promotes
type-full programming and provides compile-time mechanisms for deriving JSON codecs for sealed
traits and case classes.</p>
<p>Even though Circe is young, it’s already one of <a href="https://github.com/travisbrown/circe#performance">the fastest Scala JSON libraries</a>
around, which is quite mind-blowing given how nice, thoughtful and boilerplate-less its API is. Part
of the Circe’s great performance comes from the library it uses to parse JSON strings into JSON
ASTs. This library is called <a href="https://github.com/non/jawn">Jawn</a> and it’s one of the fastest (if not the fastest) ways to
parse JSON on JVM.</p>
<h4 id="shapeless-is-fast">Shapeless is fast</h4>
<p><a href="https://github.com/milessabin/shapeless">Shapeless</a> is a generic programming library used by many Scala projects (including
<a href="https://github.com/travisbrown/circe">Circe</a>, <a href="http://spray.io/">Spray</a>, <a href="https://github.com/scodec/scodec">scodec</a>, etc) to implement generic API (e.g., abstract over tuple arity) in a boilerplate-less manner.</p>
<p>While it might seem like Shapeless does a lot of work and does add a lot of overhead, it’s not
really the case. Most of the Shapeless-related work happens at compile time and does not affect
program running time. It shouldn’t be a surprise that Shapeless-powered code does increase compilation
time, but it almost never increase running time. Finch benchmarks of the derived vs. custom written
endpoints only conform that - the performance is literally the same.</p>
<h4 id="finch-is-fast">Finch is fast</h4>
<p>The bottom line is - Finch is in a good company. There is a team or a person, behind every single
library Finch uses, dedicated to its future and performance. I’m confident in those people and I’m
confident in libraries they maintain. Finch takes a lot from the OSS community and tries to pay it
back with a good performance.</p>
<p>The fact that Finch performs so well gives me a hope that the abstractions we’ve chosen in the
beginning are not completely broken performance-wise. And this makes me confident in Finch’s future
performance. We haven’t stopped yet. In fact, we haven’t started yet and the actual performance work
is only planned as a post 1.0 activity.</p>Vladimir KostyukovTurns out I’ve never mentioned anything about Finch, a library I’m working on most of my free time, in my personal blog. So I decided to finally fix that and write a small note on what I think about Finch’s performance as an HTTP library/server. To be more precise, I want to comment the most recent results of the TechEmpower benchmark and perhaps, give some insights on why Finch is ranked so high there.Designing a Purely Functional Data Structure2015-04-04T10:00:00+00:002015-04-04T10:00:00+00:00http://kostyukov.net/posts/designing-a-pfds<p>Functional programming nicely leverages constraints on <em>how</em> programs are written thereby promoting a clean and easy to reason about coding style. <em>Purely functional data structures</em> are (surprisingly) built out of those constraints. They are <strong>persistent</strong> (FP implies that both <em>old</em> and <em>new</em> versions of an updated object are available) and backed by <strong>immutable</strong> objects (FP doesn’t support <em>destructive updates</em>). Needless to say, it’s a challenge to design a purely functional data structure that meets performance requirements of its imperative sibling. Fortunately, it’s quite possible in most of the cases, even for those data structures whose reference implementations are backed by mutable arrays. This post precisely describes a process of designing a purely functional implementation technique for <a href="http://en.wikipedia.org/wiki/Binary_heap">Standard Binary Heaps</a>, with the same asymptotic bounds as in an imperative setting.</p>
<h4 id="immutability-and-persistence">Immutability and Persistence</h4>
<p><em>Immutability</em> and <em>persistent</em> are quite similar terms, which often substitute each other. We say <a href="http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.Vector">immutable vector</a> (in Scala) but mean <a href="http://clojuredocs.org/clojure_core/clojure.core/vector">persistent vector</a> (in Clojure): both implementations are based on the same abstract data structure <a href="http://en.wikipedia.org/wiki/Hash_array_mapped_trie">Bit-Mapped Vector Trie</a> but named differently. Although, there is a slight difference between immutability and persistence as they apply to data structures.</p>
<ul>
<li>Persistent data structures support <strong>multiple versions</strong></li>
<li>Immutable data structures <strong>aren’t changeable</strong></li>
</ul>
<p>The difference between immutable and persistent data structures in how they handle updates. A persistent data structure handles updates in a <em>smart</em> and memory-efficient way in order to keep its previous version unchanged, while an immutable data structure simply <em>doesn’t care</em> about updates at all (for example, Guava’s <a href="https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/ImmutableList.java">ImmutableList</a> doesn’t even support updates), since its previous version could be destroyed.</p>
<p>The following example demonstrates the difference between Guava’s <code class="highlighter-rouge">ImmutableList</code> and Scala’s persistent <code class="highlighter-rouge">List</code> in terms of memory footprint (smart updates vs. dumb updates).</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// xs takes O(n) memory
</span><span class="k">val</span> <span class="n">xs</span> <span class="k">=</span> <span class="nc">ImmutableList</span><span class="o">.</span><span class="n">of</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">3</span><span class="o">)</span>
<span class="c1">// yx takes O(n) memory
</span><span class="k">val</span> <span class="n">ys</span> <span class="k">=</span> <span class="mi">1</span> <span class="o">::</span> <span class="mi">2</span> <span class="o">::</span> <span class="mi">3</span> <span class="o">::</span> <span class="nc">Nil</span>
<span class="c1">// dumb update: xxs takes O(n) memory (full copying)
</span><span class="k">val</span> <span class="n">xxs</span> <span class="k">=</span> <span class="nc">ImmutableList</span><span class="o">.</span><span class="n">builder</span><span class="o">.</span><span class="n">add</span><span class="o">(</span><span class="mi">0</span><span class="o">).</span><span class="n">addAll</span><span class="o">(</span><span class="n">xs</span><span class="o">).</span><span class="n">build</span><span class="o">()</span>
<span class="c1">// smart update: yys takes O(1) memory (structural sharing)
</span><span class="k">val</span> <span class="n">yyx</span> <span class="k">=</span> <span class="mi">0</span> <span class="o">::</span> <span class="n">yx</span></code></pre></figure>
<h4 id="purely-functional-data-structures">Purely Functional Data Structures</h4>
<p>Purely functional data structures are <strong>always persistent</strong>, which means they handle updates in a memory-efficient way. This achieved by an implementation technique called <em>structural sharing</em>. A persistent data structure <em>shares</em> its internal <em>structure</em> between its versions, which is completely safe to do, since none of the versions can ever be changed or destroyed.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">xs</span> <span class="k">=</span> <span class="mi">1</span> <span class="o">::</span> <span class="mi">2</span> <span class="o">::</span> <span class="mi">3</span> <span class="o">::</span> <span class="nc">Nil</span>
<span class="k">val</span> <span class="n">xxs</span> <span class="k">=</span> <span class="mi">0</span> <span class="o">::</span> <span class="n">xs</span> <span class="o">//</span> <span class="n">shares</span> <span class="o">(</span><span class="n">not</span> <span class="n">copies</span><span class="o">)</span> <span class="n">the</span> <span class="n">tail</span> <span class="k">with</span> <span class="n">xs</span></code></pre></figure>
<p>Another heavily used implementation technique is <em>path copying</em>. It often requires to make some <em>deep</em> changes in a persistent data structure (i.e., insert, delete or update an element). To do so, we simply <em>copy</em> its nested structures (persistent data structures are often backed by <a href="http://en.wikipedia.org/wiki/Algebraic_data_type">ADTs</a>) along the <em>path</em> to an element being modified. Both path copying and structural sharing aim to minimize the cost of modifying a persistent data structure: everything that <em>can’t be shared</em> (via structural sharing) <em>is copied</em> (via path copying).</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">def</span> <span class="n">concat</span><span class="o">[</span><span class="kt">A</span><span class="o">](</span><span class="n">xs</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">],</span> <span class="n">ys</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">])</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">]</span> <span class="k">=</span>
<span class="k">if</span> <span class="o">(</span><span class="n">xs</span><span class="o">.</span><span class="n">isEmpty</span><span class="o">)</span> <span class="n">ys</span>
<span class="k">else</span> <span class="n">xs</span><span class="o">.</span><span class="n">head</span> <span class="o">::</span> <span class="n">concat</span><span class="o">(</span><span class="n">xs</span><span class="o">.</span><span class="n">tail</span><span class="o">,</span> <span class="n">ys</span><span class="o">)</span> <span class="o">//</span> <span class="n">copies</span> <span class="n">the</span> <span class="n">path</span> <span class="n">to</span> <span class="n">ys</span></code></pre></figure>
<p>Path copying is a quite lightweight operation that usually takes less than <code class="highlighter-rouge">O(n)</code> time to perform. Although, there are plenty of <em>specialized</em> data structures highly optimized for a concrete operation to make it in an amortized constant time (with no path copying). For example, <a href="http://ittc.ku.edu/~andygill/papers/IntMap98.pdf">Fast Mergeable Integer Maps</a> and <a href="http://www.math.tau.ac.il/~haimk/adv-ds-2000/okasaki-kaplan-tarjan-sicomp.ps">Persistent Catenable Lists</a> support constant time <code class="highlighter-rouge">merge</code> and <code class="highlighter-rouge">concat</code> operations correspondingly.</p>
<h4 id="purely-functional-heaps">Purely Functional Heaps</h4>
<p>Tree-based data structures (i.e., trees, heaps and tries) are considered as low-hanging fruits for a functional setting, since they map directly to <a href="http://en.wikipedia.org/wiki/Algebraic_data_type">Algebraic Data Types</a>. At the first approximation, a typical functional implementation of a persistent tree looks as follows.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">sealed</span> <span class="k">trait</span> <span class="nc">Tree</span><span class="o">[</span><span class="kt">+A</span><span class="o">]</span> <span class="o">{</span> <span class="k">def</span> <span class="n">value</span><span class="k">:</span> <span class="kt">A</span> <span class="o">}</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">Branch</span><span class="o">[</span><span class="kt">+A</span><span class="o">](</span><span class="n">value</span><span class="k">:</span> <span class="kt">A</span><span class="o">,</span> <span class="n">left</span><span class="k">:</span> <span class="kt">Tree</span><span class="o">[</span><span class="kt">A</span><span class="o">],</span> <span class="n">right</span><span class="k">:</span> <span class="kt">Tree</span><span class="o">[</span><span class="kt">A</span><span class="o">])</span> <span class="k">extends</span> <span class="nc">Tree</span><span class="o">[</span><span class="kt">A</span><span class="o">]</span>
<span class="k">case</span> <span class="k">object</span> <span class="nc">Leaf</span> <span class="k">extends</span> <span class="nc">Tree</span><span class="o">[</span><span class="kt">Nothing</span><span class="o">]</span></code></pre></figure>
<p>There are several purely functional implementations of heaps such as <a href="https://github.com/vkostyukov/scalacaster/blob/master/src/heap/LeftistHeap.scala">Leftist Heap</a>, <a href="https://github.com/vkostyukov/scalacaster/blob/master/src/heap/SkewHeap.scala">Skew Heap</a> and <a href="https://github.com/vkostyukov/scalacaster/blob/master/src/heap/PairingHeap.scala">Pairing Heap</a> with good asymptotic bounds. Although, there are other heaps without proper functional implementations. The simplest of them are <a href="http://en.wikipedia.org/wiki/Binary_heap">Standard Binary Heaps</a>, which do not fit well into a functional environment since their reference implementation is backed by mutable arrays. Luckily, it’s quite possible bring them into a purely functional world.</p>
<h4 id="standard-binary-heap">Standard Binary Heap</h4>
<p>A <em>binary heap</em> (Williams, 1964) is a data structure that implements a priority queue interface and guarantees logarithmic running time for <code class="highlighter-rouge">insert</code>, <code class="highlighter-rouge">delete</code> operations and constant time access to <code class="highlighter-rouge">minimum</code>/<code class="highlighter-rouge">maximum</code> element. Binary heaps are commonly viewed as binary trees which satisfy two invariants:</p>
<ol>
<li>The <em>shape</em> invariant: the tree is a complete binary tree.</li>
<li>The <em>min-heap</em> invariant: each node is less than or equal to each of its
children.</li>
</ol>
<p>In Scala a binary min heap might be represented as abstract Heap class with two variants: <code class="highlighter-rouge">Branch</code> and <code class="highlighter-rouge">Leaf</code>.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">sealed</span> <span class="k">trait</span> <span class="nc">Heap</span><span class="o">[</span><span class="kt">+A</span><span class="o">]</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">min</span><span class="k">:</span> <span class="kt">A</span>
<span class="k">def</span> <span class="n">left</span><span class="k">:</span> <span class="kt">Heap</span><span class="o">[</span><span class="kt">A</span><span class="o">]</span>
<span class="k">def</span> <span class="n">right</span><span class="k">:</span> <span class="kt">Heap</span><span class="o">[</span><span class="kt">A</span><span class="o">]</span>
<span class="k">def</span> <span class="n">isEmpty</span><span class="k">:</span> <span class="kt">Boolean</span>
<span class="c1">// Both 'size' and 'height' are stored in each node.
</span> <span class="k">val</span> <span class="n">size</span><span class="k">:</span> <span class="kt">Int</span>
<span class="k">val</span> <span class="n">height</span><span class="k">:</span> <span class="kt">Int</span>
<span class="o">}</span>
<span class="k">case</span> <span class="k">object</span> <span class="nc">Leaf</span> <span class="k">extends</span> <span class="nc">Heap</span><span class="o">[</span><span class="kt">Nothing</span><span class="o">]</span> <span class="o">{</span>
<span class="k">val</span> <span class="n">size</span><span class="k">:</span> <span class="kt">Int</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">val</span> <span class="n">height</span><span class="k">:</span> <span class="kt">Int</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">def</span> <span class="n">isEmpty</span><span class="k">:</span> <span class="kt">Boolean</span> <span class="o">=</span> <span class="kc">true</span>
<span class="o">}</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">Branch</span><span class="o">[</span><span class="kt">+A</span><span class="o">](</span><span class="n">min</span><span class="k">:</span> <span class="kt">A</span><span class="o">,</span> <span class="n">left</span><span class="k">:</span> <span class="kt">Heap</span><span class="o">[</span><span class="kt">A</span><span class="o">]</span> <span class="k">=</span> <span class="nc">Leaf</span><span class="o">,</span> <span class="n">right</span><span class="k">:</span> <span class="kt">Heap</span><span class="o">[</span><span class="kt">A</span><span class="o">]</span> <span class="k">=</span> <span class="nc">Leaf</span><span class="o">)</span> <span class="k">extends</span> <span class="nc">Heap</span><span class="o">[</span><span class="kt">A</span><span class="o">]</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">isEmpty</span><span class="k">:</span> <span class="kt">Boolean</span> <span class="o">=</span> <span class="kc">false</span>
<span class="k">val</span> <span class="n">size</span><span class="k">:</span> <span class="kt">Int</span> <span class="o">=</span> <span class="n">left</span><span class="o">.</span><span class="n">size</span> <span class="o">+</span> <span class="n">right</span><span class="o">.</span><span class="n">size</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">val</span> <span class="n">height</span><span class="k">:</span> <span class="kt">Int</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">max</span><span class="o">(</span><span class="n">left</span><span class="o">.</span><span class="n">height</span><span class="o">,</span> <span class="n">right</span><span class="o">.</span><span class="n">height</span><span class="o">)</span> <span class="o">+</span> <span class="mi">1</span>
<span class="o">}</span></code></pre></figure>
<p>Note that the height of a heap is defined as max height of its children plus one, while tge size of a heap is defined as the sum of its children sizes plus one; and both are calculated only once in a heap constructor. Also, to simplify calculations, suppose that singleton heap’s height is <code class="highlighter-rouge">1</code>.</p>
<p>Except for <code class="highlighter-rouge">height</code> and <code class="highlighter-rouge">size</code> operations, this signature looks like a classic functional implementation of a <a href="http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504">Binary Search Tree</a>. The two new operations are actually accessors to new fields in a heap - its height and size. These additional data should be accessible in constant time to define an efficient and simple <em>search criterion</em> for <code class="highlighter-rouge">insert</code> and <code class="highlighter-rouge">remove</code> operations.</p>
<h4 id="insertion-in-olog-n">Insertion in O(log n)</h4>
<p>Insertion into a functional binary heap must not violate either of its invariants - neither the shape invariant nor the min-heap invariant. For this purpose two problems should be solved. First, to maintain the shape invariant a new node should be inserted in the first empty spot at the last level of the heap. Second, to maintain the min-heap invariant the inserted node should be <em>bubbled up</em> to the heap root until it becomes greater than its parent.</p>
<p><img src="/images/designing-a-pfds/figure-1.png" alt="Figure 1" /></p>
<center><small>Figure 1: Eliminating min-heap invariant violations.</small></center>
<p>Bubbling up is quite a simple transformation that can be done at each level in constant time. There are two cases depending on whether the violation is at the left or right child (see “Figure 1” above). In either case the violation should be fixed by <em>swapping</em> two nodes - the root node and the child that violates the min-heap invariant. There is also a third case, when it doesn’t violate anything. In this case, a heap should be simply rebuilt with given parameters. In other words, all affected nodes should be copied in order to maintain data structure persistence. More precisely, <code class="highlighter-rouge">bubbleUp</code> and <code class="highlighter-rouge">insert</code> operations might be defined as follows.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">def</span> <span class="n">bubbleUp</span><span class="o">[</span><span class="kt">B</span> <span class="kt">:</span> <span class="kt">Ordering</span><span class="o">](</span><span class="n">x</span><span class="k">:</span> <span class="kt">B</span><span class="o">,</span> <span class="n">l</span><span class="k">:</span> <span class="kt">Heap</span><span class="o">[</span><span class="kt">B</span><span class="o">],</span> <span class="n">r</span><span class="k">:</span> <span class="kt">Heap</span><span class="o">[</span><span class="kt">B</span><span class="o">])</span><span class="k">:</span> <span class="kt">Heap</span><span class="o">[</span><span class="kt">B</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">val</span> <span class="n">ordering</span> <span class="k">=</span> <span class="nc">Ordering</span><span class="o">[</span><span class="kt">B</span><span class="o">];</span> <span class="k">import</span> <span class="nn">ordering._</span>
<span class="o">(</span><span class="n">l</span><span class="o">,</span> <span class="n">r</span><span class="o">)</span> <span class="k">match</span> <span class="o">{</span>
<span class="k">case</span> <span class="o">(</span><span class="nc">Branch</span><span class="o">(</span><span class="n">y</span><span class="o">,</span> <span class="n">lt</span><span class="o">,</span> <span class="n">rt</span><span class="o">),</span> <span class="k">_</span><span class="o">)</span> <span class="k">if</span> <span class="o">(</span><span class="n">x</span> <span class="o">></span> <span class="n">y</span><span class="o">)</span> <span class="k">=></span>
<span class="nc">Branch</span><span class="o">(</span><span class="n">y</span><span class="o">,</span> <span class="nc">Branch</span><span class="o">(</span><span class="n">x</span><span class="o">,</span> <span class="n">lt</span><span class="o">,</span> <span class="n">rt</span><span class="o">),</span> <span class="n">r</span><span class="o">)</span>
<span class="k">case</span> <span class="o">(</span><span class="k">_</span><span class="o">,</span> <span class="nc">Branch</span><span class="o">(</span><span class="n">z</span><span class="o">,</span> <span class="n">lt</span><span class="o">,</span> <span class="n">rt</span><span class="o">))</span> <span class="k">if</span> <span class="o">(</span><span class="n">x</span> <span class="o">></span> <span class="n">z</span><span class="o">)</span> <span class="k">=></span>
<span class="nc">Branch</span><span class="o">(</span><span class="n">z</span><span class="o">,</span> <span class="n">l</span><span class="o">,</span> <span class="nc">Branch</span><span class="o">(</span><span class="n">x</span><span class="o">,</span> <span class="n">lt</span><span class="o">,</span> <span class="n">rt</span><span class="o">))</span>
<span class="k">case</span> <span class="o">(</span><span class="k">_</span><span class="o">,</span> <span class="k">_</span><span class="o">)</span> <span class="k">=></span>
<span class="nc">Branch</span><span class="o">(</span><span class="n">x</span><span class="o">,</span> <span class="n">l</span><span class="o">,</span> <span class="n">r</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="k">def</span> <span class="n">insert</span><span class="o">[</span><span class="kt">B</span> <span class="k">>:</span> <span class="kt">A</span> <span class="kt">:</span> <span class="kt">Ordering</span><span class="o">](</span><span class="n">x</span><span class="k">:</span> <span class="kt">B</span><span class="o">)</span><span class="k">:</span> <span class="kt">Heap</span><span class="o">[</span><span class="kt">B</span><span class="o">]</span> <span class="k">=</span>
<span class="k">if</span> <span class="o">(</span><span class="n">isEmpty</span><span class="o">)</span> <span class="nc">Branch</span><span class="o">(</span><span class="n">x</span><span class="o">)</span>
<span class="k">else</span> <span class="k">if</span> <span class="o">(???)</span> <span class="n">bubbleUp</span><span class="o">(</span><span class="n">min</span><span class="o">,</span> <span class="n">left</span><span class="o">,</span> <span class="n">right</span><span class="o">.</span><span class="n">insert</span><span class="o">(</span><span class="n">x</span><span class="o">))</span>
<span class="k">else</span> <span class="n">bubbleUp</span><span class="o">(</span><span class="n">min</span><span class="o">,</span> <span class="n">left</span><span class="o">.</span><span class="n">insert</span><span class="o">(</span><span class="n">x</span><span class="o">),</span> <span class="n">right</span><span class="o">)</span></code></pre></figure>
<p>The last thing to discuss is how to find a proper spot for a new node. The algorithm is based on a simple idea that binary heap will always be a <em>complete</em> tree if it tends to be a <em>perfect</em> tree each time it’s modified. There are two definitions of perfect trees: <em>mathematical</em> and <em>recursive</em>. Mathematical definition: a perfect binary tree contains <code class="highlighter-rouge">2^(h+1) − 1</code> nodes, where <code class="highlighter-rouge">h</code> is the height of the tree. Recursive definition: a tree is perfect if its children are perfect trees of the same height. Combining these facts together, one can define search criteria which allow filling a heap level by level from left to right, thereby maintaining the shape invariant. In other words, new nodes should be inserted in such a way as to make the heap be a perfect tree. This can be simply achieved by the following requirements of the recursive definition, using the math definition as an efficient test on tree perfectness. Thus, the search criteria for insertion consist of four cases depending on whether the children are perfect trees and whether their heights are equal.</p>
<p><img src="/images/designing-a-pfds/figure-2.png" alt="Figure 2" /></p>
<center><small>Figure 2: Searching for the first empty spot in a heap.</small></center>
<p>The straightforward implementation of this idea (see “Figure 2” above) with four cases looks as follows.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">def</span> <span class="n">insert</span><span class="o">[</span><span class="kt">B</span> <span class="k">>:</span> <span class="kt">A</span> <span class="kt">:</span> <span class="kt">Ordering</span><span class="o">](</span><span class="n">x</span><span class="k">:</span> <span class="kt">B</span><span class="o">)</span><span class="k">:</span> <span class="kt">Heap</span><span class="o">[</span><span class="kt">B</span><span class="o">]</span> <span class="k">=</span>
<span class="k">if</span> <span class="o">(</span><span class="n">isEmpty</span><span class="o">)</span> <span class="nc">Heap</span><span class="o">(</span><span class="n">x</span><span class="o">)</span>
<span class="k">else</span> <span class="k">if</span> <span class="o">(</span><span class="n">left</span><span class="o">.</span><span class="n">size</span> <span class="o"><</span> <span class="n">math</span><span class="o">.</span><span class="n">pow</span><span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="n">left</span><span class="o">.</span><span class="n">height</span><span class="o">)</span> <span class="o">-</span> <span class="mi">1</span><span class="o">)</span>
<span class="n">bubbleUp</span><span class="o">(</span><span class="n">min</span><span class="o">,</span> <span class="n">left</span><span class="o">.</span><span class="n">insert</span><span class="o">(</span><span class="n">x</span><span class="o">),</span> <span class="n">right</span><span class="o">)</span>
<span class="k">else</span> <span class="k">if</span> <span class="o">(</span><span class="n">right</span><span class="o">.</span><span class="n">size</span> <span class="o"><</span> <span class="n">math</span><span class="o">.</span><span class="n">pow</span><span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="n">right</span><span class="o">.</span><span class="n">height</span><span class="o">)</span> <span class="o">-</span> <span class="mi">1</span><span class="o">)</span>
<span class="n">bubbleUp</span><span class="o">(</span><span class="n">min</span><span class="o">,</span> <span class="n">left</span><span class="o">,</span> <span class="n">right</span><span class="o">.</span><span class="n">insert</span><span class="o">(</span><span class="n">x</span><span class="o">))</span>
<span class="k">else</span> <span class="k">if</span> <span class="o">(</span><span class="n">right</span><span class="o">.</span><span class="n">height</span> <span class="o"><</span> <span class="n">left</span><span class="o">.</span><span class="n">height</span><span class="o">)</span>
<span class="n">bubbleUp</span><span class="o">(</span><span class="n">min</span><span class="o">,</span> <span class="n">left</span><span class="o">,</span> <span class="n">right</span><span class="o">.</span><span class="n">insert</span><span class="o">(</span><span class="n">x</span><span class="o">))</span>
<span class="k">else</span> <span class="n">bubbleUp</span><span class="o">(</span><span class="n">min</span><span class="o">,</span> <span class="n">left</span><span class="o">.</span><span class="n">insert</span><span class="o">(</span><span class="n">x</span><span class="o">),</span> <span class="n">right</span><span class="o">)</span></code></pre></figure>
<p>The <code class="highlighter-rouge">insert</code> operation performs <em>two</em> traversals along the search path of a heap. First, in a top-down manner it searches for the first empty spot in a heap thereby maintaining the shape invariant. Second, it <em>rebuilds</em> the affected nodes of a heap in a bottom-up manner thereby maintaining the min-heap invariant. Both traversal take less than <code class="highlighter-rouge">O(log n)</code>, since the longest possible path for perfect trees is <code class="highlighter-rouge">log n</code>. Thus, the time complexity of insertion is <code class="highlighter-rouge">O(log n)</code>.</p>
<h4 id="conclusion">Conclusion</h4>
<p>The most exciting thing about purely functional data structures is that there is always room for new ideas and techniques. Even today, this direction still attracts researches and enthusiasts of functional programming. It’s been 15 years, since <a href="http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504">Okasaki</a> and the field is <a href="http://cstheory.stackexchange.com/questions/1539/whats-new-in-purely-functional-data-structures-since-okasaki">still developing</a>: modern languages like Scala require modern and efficient data structures with optimal purely functional implementations.</p>
<p>The heap implementation in this post is based on a paper <a href="http://arxiv.org/pdf/1312.4666v1.pdf">A Functional Approach for Standard Binary Heaps, 2013</a>. The full source code (including operations <code class="highlighter-rouge">remove</code> and <code class="highlighter-rouge">heapify</code>) is available <a href="https://github.com/vkostyukov/scalacaster/blob/master/src/heap/StandardHeap.scala">on Github</a>.</p>Vladimir KostyukovFunctional programming nicely leverages constraints on how programs are written thereby promoting a clean and easy to reason about coding style. Purely functional data structures are (surprisingly) built out of those constraints. They are persistent (FP implies that both old and new versions of an updated object are available) and backed by immutable objects (FP doesn’t support destructive updates). Needless to say, it’s a challenge to design a purely functional data structure that meets performance requirements of its imperative sibling. Fortunately, it’s quite possible in most of the cases, even for those data structures whose reference implementations are backed by mutable arrays. This post precisely describes a process of designing a purely functional implementation technique for Standard Binary Heaps, with the same asymptotic bounds as in an imperative setting.Combinatorial Algorithms in Scala2014-04-01T10:00:00+00:002014-04-01T10:00:00+00:00http://kostyukov.net/posts/combinatorial-algorithms-in-scala<p><a href="http://en.wikipedia.org/wiki/Combinatorics">Combinatorics</a> is a branch of mathematics that mostly focuses on problems of counting the structures of given size and kind. The most famous and well-known examples of such problems might be often asked as job interview questions. This blog post presents four generational problems (<a href="http://en.wikipedia.org/wiki/Combination">combinations</a>, <a href="http://en.wikipedia.org/wiki/Subset">subsets</a>, <a href="http://en.wikipedia.org/wiki/Permutations">permutations</a> and <a href="http://en.wikipedia.org/wiki/Combination">variations</a>) along with their <em>purely functional</em> implementations in Scala.</p>
<h4 id="implicit-classes">Implicit Classes</h4>
<p>Scala’s <a href="http://docs.scala-lang.org/sips/completed/implicit-classes.html"><em>implicit classes</em></a> provide simple and composable way of extending the API of third-party classes. For example, the following implicit class extends default <code class="highlighter-rouge">Int</code> class within a new method <code class="highlighter-rouge">times(fn: Unit => Unit): Unit</code> that executes given function <code class="highlighter-rouge">fn</code> n-times.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">object</span> <span class="nc">IntOps</span> <span class="o">{</span>
<span class="k">implicit</span> <span class="k">class</span> <span class="nc">ExtendedInt</span><span class="o">(</span><span class="n">n</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">times</span><span class="o">(</span><span class="n">fn</span><span class="k">:</span> <span class="kt">Unit</span> <span class="o">=></span> <span class="nc">Unit</span><span class="o">)</span><span class="k">:</span> <span class="kt">Unit</span> <span class="o">=</span>
<span class="o">(</span><span class="mi">0</span> <span class="n">until</span> <span class="n">n</span><span class="o">).</span><span class="n">foreach</span><span class="o">(</span><span class="n">fn</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span> </code></pre></figure>
<p>This gives us a very neat usage way. All one need to do is import an implicit class into the current namespace and let the magic happened.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">IntOps._</span>
<span class="mf">5.</span><span class="n">times</span> <span class="o">{</span>
<span class="n">println</span><span class="o">(</span><span class="s">"Hello, World!"</span><span class="o">)</span>
<span class="o">}</span></code></pre></figure>
<p>We’ll use this approach in order to extend Scala’s <code class="highlighter-rouge">List</code> with four new methods that implement our combinatorial algorithms. The only one restriction we have to satisfy here: new functions’ names shouldn’t conflict with an existent API. Thus, we’ll use a prefix <code class="highlighter-rouge">x</code> (from <em>eXtended</em>) for new functions. The following listing represents a skeleton class we’re going to implement.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">object</span> <span class="nc">CombinatorialOps</span> <span class="o">{</span>
<span class="k">implicit</span> <span class="k">class</span> <span class="nc">CombinatorialList</span><span class="o">[</span><span class="kt">A</span><span class="o">](</span><span class="n">l</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">])</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">xcombinations</span><span class="o">(</span><span class="n">n</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">]]</span> <span class="k">=</span> <span class="o">???</span>
<span class="k">def</span> <span class="n">xsubsets</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">]]</span> <span class="k">=</span> <span class="o">???</span>
<span class="k">def</span> <span class="n">xvariations</span><span class="o">(</span><span class="n">n</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">]]</span> <span class="k">=</span> <span class="o">???</span>
<span class="k">def</span> <span class="n">xpermutations</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">]]</span> <span class="k">=</span> <span class="o">???</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>This tiny class might be used as follows (in the exact way as <code class="highlighter-rouge">IntOps</code> was used below).</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">CombinatorialOps._</span>
<span class="k">val</span> <span class="n">c</span> <span class="k">=</span> <span class="nc">List</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">3</span><span class="o">).</span><span class="n">xcombinations</span><span class="o">(</span><span class="mi">2</span><span class="o">)</span></code></pre></figure>
<h4 id="optimistic-programming">Optimistic Programming</h4>
<p><em>Optimistic Programming</em> is an implementation technique of recursive programs when it’s believed that a recursive function works as expected on a smaller input (on a sub-problem) in order to use its result for solving the full-size problem. In other words, the body of recursive function may be implemented in terms of following ideas: (a) when called recursively it gives the right answer for any sub-problem, but (b) some additional work should be done in order to merge these sub-problem solutions into the single solution of the entire problem. Doesn’t that sound <em>optimistic</em>? The recursive function is pretended to be correctly implemented <em>before</em> its body is actually being written.</p>
<p>An optimistic programming lie between <a href="http://en.wikipedia.org/wiki/Divide_and_conquer_algorithms">Divide and Conquer</a> and <a href="http://en.wikipedia.org/wiki/Dynamic_programming">Dynamic Programming</a> techniques. Rather then focussing on how sub-problems are being splited (whether or not the sub-problems overlap) an optimistic programming focuses on the nature of recursive programs and provides a simple tool making the programming of complex problems much easier.</p>
<p>We’ll use an optimistic programming for solving combinatorial problems in a functional setting.</p>
<h4 id="combinations">Combinations</h4>
<p>Imagine you’re given a standard deck of fifty-two cards and asked to select any two of them. Those pair of cards you’ll select is called a <em>combination</em> (i.e., a <em>2-combination</em>). And there are 1326 such 2-card combinations that may be possibly selected from a standard card deck. More formally, a <a href="http://en.wikipedia.org/wiki/Binomial_coefficient">binomial coefficient</a> defines the number of <em>k-combinations</em> from a set of <code class="highlighter-rouge">n</code> distinct elements.</p>
<p>The combination elements’ order <em>doesn’t</em> matter. So, <code class="highlighter-rouge">[a, b]</code> and <code class="highlighter-rouge">[b, a]</code> are the same combinations.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">scala</span><span class="o">></span> <span class="nc">List</span><span class="o">(</span><span class="s">"a"</span><span class="o">,</span> <span class="s">"b"</span><span class="o">,</span> <span class="s">"c"</span><span class="o">).</span><span class="n">xcombinations</span><span class="o">(</span><span class="mi">2</span><span class="o">)</span>
<span class="n">res1</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">String</span><span class="o">]]</span> <span class="k">=</span> <span class="nc">List</span><span class="o">(</span><span class="nc">List</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">c</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">b</span><span class="o">,</span> <span class="n">c</span><span class="o">))</span></code></pre></figure>
<p>It’s time to use the power of optimistic programming for solving the problem of generating <em>k-combinations</em>. An optimistic programming guarantees that a recursive function being called on a <em>sub-problem</em> produces a correct answer. A sub-problem of generating k-combinations is generating (k-1)-combinations. The only one question’s left: how to solve an entire problem then? This is when the things become interesting. Obviously, there is should be <em>an extra element</em> in the set, which being added to a (k-1)-combination upgrades it to a <em>full-size</em> k-combination. A set’s <em>extra</em> element is nothing different from a <em>regular</em> set’s element. And set <code class="highlighter-rouge">S</code> itself is a <em>recursive object</em>, which without one element is still a set <code class="highlighter-rouge">S'</code> and may be processed recursively. Thus, the final solution contains both <code class="highlighter-rouge">S'</code>’s k-combinations and <code class="highlighter-rouge">S'</code>’s (k-1)-combinations with an extra element appended.</p>
<p>There are also two corner cases that we have to handle separately. There is nothing to do when <code class="highlighter-rouge">k > n</code> (combination’s size is greater then an entire set’s size). And there is no further grouping required if it’s a generation of 1-combinations.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="cm">/**
* Generates the combinations of this list with given length 'n'. The order
* doesn't matter.
*
* The total number of k-combinations on n-length set might be calculated
* as follows:
*
* C_k,n = n!/k!(n - k)!
*
* Time - O(C_k,n)
* Space - O(C_k,n)
*/</span>
<span class="k">def</span> <span class="n">xcombinations</span><span class="o">(</span><span class="n">n</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">]]</span> <span class="k">=</span>
<span class="k">if</span> <span class="o">(</span><span class="n">n</span> <span class="o">></span> <span class="n">xsize</span><span class="o">)</span> <span class="nc">Nil</span>
<span class="k">else</span> <span class="n">l</span> <span class="k">match</span> <span class="o">{</span>
<span class="k">case</span> <span class="k">_</span> <span class="o">::</span> <span class="k">_</span> <span class="k">if</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">1</span> <span class="k">=></span>
<span class="n">l</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="nc">List</span><span class="o">(</span><span class="k">_</span><span class="o">))</span>
<span class="k">case</span> <span class="n">hd</span> <span class="o">::</span> <span class="n">tl</span> <span class="k">=></span>
<span class="n">tl</span><span class="o">.</span><span class="n">xcombinations</span><span class="o">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="o">).</span><span class="n">map</span><span class="o">(</span><span class="n">hd</span> <span class="o">::</span> <span class="k">_</span><span class="o">)</span> <span class="o">:::</span> <span class="n">tl</span><span class="o">.</span><span class="n">xcombinations</span><span class="o">(</span><span class="n">n</span><span class="o">)</span>
<span class="k">case</span> <span class="k">_</span> <span class="k">=></span> <span class="nc">Nil</span>
<span class="o">}</span></code></pre></figure>
<h4 id="subsets">Subsets</h4>
<p>A set’s k-combination may also be referenced as a <em>subset</em>. The other combinatorial problem is generating all the subsets (all k-combinations, where <code class="highlighter-rouge">k = 1..n</code>) of a given set.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">scala</span><span class="o">></span> <span class="nc">List</span><span class="o">(</span><span class="s">"a"</span><span class="o">,</span> <span class="s">"b"</span><span class="o">,</span> <span class="s">"c"</span><span class="o">).</span><span class="n">xsubsets</span>
<span class="n">res1</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">String</span><span class="o">]]</span> <span class="k">=</span> <span class="nc">List</span><span class="o">(</span><span class="nc">List</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">,</span> <span class="n">c</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">c</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">b</span><span class="o">,</span> <span class="n">c</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">a</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">b</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">c</span><span class="o">))</span></code></pre></figure>
<p>The implementation is straightforward - combinations of all the possible sizes should be merged together. That may be done by List’s <code class="highlighter-rouge">foldLeft</code> operation.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="cm">/**
* Generates all the subsets of this list. The order doesn't matter.
*
* The total number of subsets might be obtained from variations formula:
*
* S_n = sum(i=1..n) {C_i,n} = 2 ** n
*
* Time - O(S_n)
* Space - O(S_n)
*/</span>
<span class="k">def</span> <span class="n">xsubsets</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">]]</span> <span class="k">=</span>
<span class="o">(</span><span class="mi">2</span> <span class="n">to</span> <span class="n">xsize</span><span class="o">).</span><span class="n">foldLeft</span><span class="o">(</span><span class="n">l</span><span class="o">.</span><span class="n">xcombinations</span><span class="o">(</span><span class="mi">1</span><span class="o">))((</span><span class="n">a</span><span class="o">,</span> <span class="n">i</span><span class="o">)</span> <span class="k">=></span> <span class="n">l</span><span class="o">.</span><span class="n">xcombinations</span><span class="o">(</span><span class="n">i</span><span class="o">)</span> <span class="o">:::</span> <span class="n">a</span><span class="o">)</span></code></pre></figure>
<p>There are <code class="highlighter-rouge">2^n</code> subset of an n-size set. It’s choice of two: every set’s element is either taken or not into the particular subset.</p>
<h4 id="variations">Variations</h4>
<p>Unlike combinations, the order of elements inside a variation <em>does</em> matter. Thus, tuples <code class="highlighter-rouge">[a, b]</code> and <code class="highlighter-rouge">[b, a]</code> are different <em>variations</em> (i.e., <em>2-variations</em>). In general, variations are denoted as <em>partial permutations</em> or <em>k-permutations</em>, where <code class="highlighter-rouge">0 < k <= n</code>.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">scala</span><span class="o">></span> <span class="nc">List</span><span class="o">(</span><span class="s">"a"</span><span class="o">,</span> <span class="s">"b"</span><span class="o">,</span> <span class="s">"c"</span><span class="o">).</span><span class="n">xvariations</span><span class="o">(</span><span class="mi">2</span><span class="o">)</span>
<span class="n">res1</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">String</span><span class="o">]]</span> <span class="k">=</span> <span class="nc">List</span><span class="o">(</span><span class="nc">List</span><span class="o">(</span><span class="n">b</span><span class="o">,</span> <span class="n">a</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">c</span><span class="o">,</span> <span class="n">a</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">c</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">c</span><span class="o">,</span> <span class="n">b</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">b</span><span class="o">,</span> <span class="n">c</span><span class="o">))</span></code></pre></figure>
<p>The number of <em>k-permutations</em> of <code class="highlighter-rouge">n</code> is the following product: <code class="highlighter-rouge">n * (n-1) * ... * (n-k+1)</code>. That’s a bit different from a <em>binomial coefficient</em> (the number of <em>k-combinations</em> of <code class="highlighter-rouge">n</code>): there is no <code class="highlighter-rouge">k!</code> in a denominator, since it counts <em>all</em> the possible k-permutations rather then treating them equal.</p>
<p>The same ideas of an <em>optimistic programming</em> may be used in generating the variations (k-permutations) of a given set. The corner cases are the same: there’s nothing to do with <code class="highlighter-rouge">k > n</code> or <code class="highlighter-rouge">k = 1</code>. Just like in combinations, these two cases should be handled separately. More interesting is the regular case: upgrading a recursively generated (k-1)-permutation to a <em>full-size</em> one. It’s no longer a problem of getting <em>an extra element</em> from a set, as well as the <em>upgrading</em> itself is no longer a merging.</p>
<p>Since the order does matter, an extra element should be <em>inserted</em> into the every possible place of a permutation rather then just being merged with it. So, instead of 1-by-1 mapping between unfinished and finished combination it comes to 1-by-k mapping for permutations: there are <code class="highlighter-rouge">k</code> places in (k-1)-permutation where an extra element may be inserted.</p>
<p>Ultimately, by analogy with k-combinations, k-permutations of <code class="highlighter-rouge">S</code> contain all the k-permutations of <code class="highlighter-rouge">S'</code>, where <code class="highlighter-rouge">S'</code> is a without-out-element version of <code class="highlighter-rouge">S</code>.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="cm">/**
* Generates the variations of this list with given length 'n'. The order
* does matter.
*
* The total number of variations might be calculated as follows:
*
* V_k,n = n!/(n - k)!
*
* Time - O(V_k,n)
* Space - O(V_k,n)
*/</span>
<span class="k">def</span> <span class="n">xvariations</span><span class="o">(</span><span class="n">n</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">]]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">mixmany</span><span class="o">(</span><span class="n">x</span><span class="k">:</span> <span class="kt">A</span><span class="o">,</span> <span class="n">ll</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">]])</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">]]</span> <span class="k">=</span> <span class="n">ll</span> <span class="k">match</span> <span class="o">{</span>
<span class="k">case</span> <span class="n">hd</span> <span class="o">::</span> <span class="n">tl</span> <span class="k">=></span> <span class="n">foldone</span><span class="o">(</span><span class="n">x</span><span class="o">,</span> <span class="n">hd</span><span class="o">)</span> <span class="o">:::</span> <span class="n">mixmany</span><span class="o">(</span><span class="n">x</span><span class="o">,</span> <span class="n">tl</span><span class="o">)</span>
<span class="k">case</span> <span class="k">_</span> <span class="k">=></span> <span class="nc">Nil</span>
<span class="o">}</span>
<span class="k">def</span> <span class="n">foldone</span><span class="o">(</span><span class="n">x</span><span class="k">:</span> <span class="kt">A</span><span class="o">,</span> <span class="n">ll</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">])</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">]]</span> <span class="k">=</span>
<span class="o">(</span><span class="mi">1</span> <span class="n">to</span> <span class="n">ll</span><span class="o">.</span><span class="n">length</span><span class="o">).</span><span class="n">foldLeft</span><span class="o">(</span><span class="nc">List</span><span class="o">(</span><span class="n">x</span> <span class="o">::</span> <span class="n">ll</span><span class="o">))((</span><span class="n">a</span><span class="o">,</span> <span class="n">i</span><span class="o">)</span> <span class="k">=></span> <span class="o">(</span><span class="n">mixone</span><span class="o">(</span><span class="n">i</span><span class="o">,</span> <span class="n">x</span><span class="o">,</span> <span class="n">ll</span><span class="o">))</span> <span class="o">::</span> <span class="n">a</span><span class="o">)</span>
<span class="k">def</span> <span class="n">mixone</span><span class="o">(</span><span class="n">i</span><span class="k">:</span> <span class="kt">Int</span><span class="o">,</span> <span class="n">x</span><span class="k">:</span> <span class="kt">A</span><span class="o">,</span> <span class="n">ll</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">])</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">]</span> <span class="k">=</span>
<span class="n">ll</span><span class="o">.</span><span class="n">slice</span><span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="n">i</span><span class="o">)</span> <span class="o">:::</span> <span class="o">(</span><span class="n">x</span> <span class="o">::</span> <span class="n">ll</span><span class="o">.</span><span class="n">slice</span><span class="o">(</span><span class="n">i</span><span class="o">,</span> <span class="n">ll</span><span class="o">.</span><span class="n">length</span><span class="o">))</span>
<span class="k">if</span> <span class="o">(</span><span class="n">n</span> <span class="o">></span> <span class="n">xsize</span><span class="o">)</span> <span class="nc">Nil</span>
<span class="k">else</span> <span class="n">l</span> <span class="k">match</span> <span class="o">{</span>
<span class="k">case</span> <span class="k">_</span> <span class="o">::</span> <span class="k">_</span> <span class="k">if</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">1</span> <span class="k">=></span> <span class="n">l</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="nc">List</span><span class="o">(</span><span class="k">_</span><span class="o">))</span>
<span class="k">case</span> <span class="n">hd</span> <span class="o">::</span> <span class="n">tl</span> <span class="k">=></span> <span class="n">mixmany</span><span class="o">(</span><span class="n">hd</span><span class="o">,</span> <span class="n">tl</span><span class="o">.</span><span class="n">xvariations</span><span class="o">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="o">))</span> <span class="o">:::</span> <span class="n">tl</span><span class="o">.</span><span class="n">xvariations</span><span class="o">(</span><span class="n">n</span><span class="o">)</span>
<span class="k">case</span> <span class="k">_</span> <span class="k">=></span> <span class="nc">Nil</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<h4 id="permutations">Permutations</h4>
<p><em>Permutations</em> are just set’s size variations or k-permutations with <code class="highlighter-rouge">k = n</code>. A permutation may also be viewed as a result of a set’s <em>shuffle</em> operation. In other words, every iteration of a shuffling the deck of cards process gives a new permutation. Permutations are counted by a product: <code class="highlighter-rouge">n * (n-1) * ... 1</code>, which is <code class="highlighter-rouge">n!</code>.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">scala</span><span class="o">></span> <span class="nc">List</span><span class="o">(</span><span class="s">"a"</span><span class="o">,</span> <span class="s">"b"</span><span class="o">,</span> <span class="s">"c"</span><span class="o">).</span><span class="n">xpermutations</span>
<span class="n">res1</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">String</span><span class="o">]]</span> <span class="k">=</span> <span class="nc">List</span><span class="o">(</span><span class="nc">List</span><span class="o">(</span><span class="n">c</span><span class="o">,</span> <span class="n">b</span><span class="o">,</span> <span class="n">a</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">c</span><span class="o">,</span> <span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">c</span><span class="o">,</span> <span class="n">b</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">b</span><span class="o">,</span> <span class="n">c</span><span class="o">,</span> <span class="n">a</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">b</span><span class="o">,</span> <span class="n">a</span><span class="o">,</span> <span class="n">c</span><span class="o">),</span>
<span class="nc">List</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">,</span> <span class="n">c</span><span class="o">))</span></code></pre></figure>
<p>The implementation of a purely-functional algorithm of generating the permutations is quite simple in terms of variations.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="cm">/**
* Generates all permutations of this list. The order does matter.
*
* The total number of permutations might be calculated as follows:
*
* P_n = V_n,n = n!
*
* Time - O(n!)
* Space - O(n!)
*/</span>
<span class="k">def</span> <span class="n">xpermutations</span><span class="k">:</span> <span class="kt">List</span><span class="o">[</span><span class="kt">List</span><span class="o">[</span><span class="kt">A</span><span class="o">]]</span> <span class="k">=</span> <span class="n">xvariations</span><span class="o">(</span><span class="n">xsize</span><span class="o">)</span></code></pre></figure>
<h4 id="further-improvements">Further Improvements</h4>
<p>The full version of <code class="highlighter-rouge">CombinatorialOps</code> class might be found <a href="https://gist.github.com/vkostyukov/9015987">at GitHub</a>. In order to reduce the memory footprint a bit of lazinesses may be involved by (a) replacing the output data type <code class="highlighter-rouge">List[List[A]]</code> with <code class="highlighter-rouge">Iterable[List[A]]</code> and (b) generating each piece of data <em>on-demand</em>.</p>Vladimir KostyukovCombinatorics is a branch of mathematics that mostly focuses on problems of counting the structures of given size and kind. The most famous and well-known examples of such problems might be often asked as job interview questions. This blog post presents four generational problems (combinations, subsets, permutations and variations) along with their purely functional implementations in Scala.Dual-Pivot Binary Search2014-02-06T10:00:00+00:002014-02-06T10:00:00+00:00http://kostyukov.net/posts/dual-pivot-binary-search<p>In 2009, Vladimir Yaroslavski introduced a <a href="http://iaroslavski.narod.ru/quicksort/DualPivotQuicksort.pdf">Dual-Pivot QuickSort</a> algorithm, which is currently the default sorting algorithm for primitive types in Java 8. The idea behind this algorithm is both simple and awesome. Instead of using single pivot element, it uses two pivots that divide an input array into three intervals (against two intervals in original <a href="http://en.wikipedia.org/wiki/Quicksort">QuickSort</a>). This allowed to decrease the height of <a href="http://www.cs.cornell.edu/courses/cs3110/2012sp/lectures/lec20-master/lec20.html">recursion tree</a> as well as reduce the number of comparisons. The post describes a similar dual-pivot approach but for a <a href="http://en.wikipedia.org/wiki/Binary_search_algorithm">BinarySearch</a> algorithm. Thus, our modified binary search algorithm has prefix <em>Dual-Pivot</em>.</p>
<p>First of all, consider a standard variation of a <em>binary search</em> algorithm.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kt">int</span> <span class="nf">binarysearch</span><span class="o">(</span><span class="kt">int</span> <span class="n">a</span><span class="o">[],</span> <span class="kt">int</span> <span class="n">k</span><span class="o">,</span> <span class="kt">int</span> <span class="n">lo</span><span class="o">,</span> <span class="kt">int</span> <span class="n">hi</span><span class="o">)</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">(</span><span class="n">lo</span> <span class="o">==</span> <span class="n">hi</span><span class="o">)</span> <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="o">;</span>
<span class="kt">int</span> <span class="n">p</span> <span class="o">=</span> <span class="n">lo</span> <span class="o">+</span> <span class="o">(</span><span class="n">hi</span> <span class="o">-</span> <span class="n">lo</span><span class="o">)</span> <span class="o">/</span> <span class="mi">2</span><span class="o">;</span>
<span class="k">if</span> <span class="o">(</span><span class="n">k</span> <span class="o"><</span> <span class="n">a</span><span class="o">[</span><span class="n">p</span><span class="o">])</span> <span class="o">{</span>
<span class="k">return</span> <span class="nf">binarysearch</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">k</span><span class="o">,</span> <span class="n">lo</span><span class="o">,</span> <span class="n">p</span><span class="o">);</span>
<span class="o">}</span> <span class="k">else</span> <span class="k">if</span> <span class="o">(</span><span class="n">k</span> <span class="o">></span> <span class="n">a</span><span class="o">[</span><span class="n">p</span><span class="o">])</span> <span class="o">{</span>
<span class="k">return</span> <span class="nf">binarysearch</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">k</span><span class="o">,</span> <span class="n">p</span> <span class="o">+</span> <span class="mi">1</span><span class="o">,</span> <span class="n">hi</span><span class="o">);</span>
<span class="o">}</span>
<span class="k">return</span> <span class="n">p</span><span class="o">;</span>
<span class="o">}</span></code></pre></figure>
<p>We’ll use a <a href="http://en.wikipedia.org/wiki/Master_theorem">Master Method</a> in order to understand its time complexity in terms of <a href="http://en.wikipedia.org/wiki/Big_O_notation">Big-Oh</a> notation. The idea behind a master method is to express algorithm’s running time in terms of the following recurrent relation.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">T</span><span class="o">(</span><span class="n">n</span><span class="o">)</span> <span class="o">=</span> <span class="n">a</span> <span class="o">*</span> <span class="n">T</span><span class="o">(</span><span class="n">n</span><span class="o">/</span><span class="n">b</span><span class="o">)</span> <span class="o">+</span> <span class="n">O</span><span class="o">(</span><span class="n">n</span><span class="o">^</span><span class="n">c</span><span class="o">)</span></code></pre></figure>
<p>The exact meaning of this relation is following: the running time <code class="highlighter-rouge">T(n)</code> of the algorithm on input <code class="highlighter-rouge">n</code> is equal to sum of the running times of each recursive call <code class="highlighter-rouge">T(n/b)</code> plus some extra job <code class="highlighter-rouge">O(n^c)</code> at each level of recursion. Note that a master method only works for <a href="http://en.wikipedia.org/wiki/Divide_and_conquer_algorithm">Divide and Conquer</a> algorithms.</p>
<p>The binary search algorithm does</p>
<ul>
<li>split the input onto two equals intervals: <code class="highlighter-rouge">b = 2</code></li>
<li>perform only one recursive call, depending on whether the key is less or greater then the pivot element: <code class="highlighter-rouge">a = 1</code></li>
<li>compare the key with the pivot element at each level of recursion, which takes a constant time: <code class="highlighter-rouge">c = 0</code></li>
</ul>
<p>Thus, the following recurrent relation describes a standard binary search algorithm, where <code class="highlighter-rouge">a = 1</code> (number of recursive calls), <code class="highlighter-rouge">b = 2</code> (how many pieces we split the data at each level of recursion) and <code class="highlighter-rouge">c = 0</code> (we also do some constant work at each recursion call).</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">T</span><span class="o">(</span><span class="n">n</span><span class="o">)</span> <span class="o">=</span> <span class="n">T</span><span class="o">(</span><span class="n">n</span><span class="o">/</span><span class="mi">2</span><span class="o">)</span> <span class="o">+</span> <span class="n">O</span><span class="o">(</span><span class="mi">1</span><span class="o">)</span></code></pre></figure>
<p>The relation <code class="highlighter-rouge">a == b ^ c</code> or <code class="highlighter-rouge">1 == 2 ^ 0</code> gives us the first case in a master method, which results in the running time <code class="highlighter-rouge">O(n^c * log_b n)</code> or <code class="highlighter-rouge">O(log_2 n)</code> in particular.</p>
<p>It’s time to use a dual-pivot element instead of a single-pivot one. This gives us three intervals and a couple of additional comparisons.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kt">int</span> <span class="nf">dualPivotBinarysearch</span><span class="o">(</span><span class="kt">int</span> <span class="n">a</span><span class="o">[],</span> <span class="kt">int</span> <span class="n">k</span><span class="o">,</span> <span class="kt">int</span> <span class="n">lo</span><span class="o">,</span> <span class="kt">int</span> <span class="n">hi</span><span class="o">)</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">(</span><span class="n">lo</span> <span class="o">==</span> <span class="n">hi</span><span class="o">)</span> <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="o">;</span>
<span class="kt">int</span> <span class="n">p</span> <span class="o">=</span> <span class="n">lo</span> <span class="o">+</span> <span class="o">(</span><span class="n">hi</span> <span class="o">-</span> <span class="n">lo</span><span class="o">)</span> <span class="o">/</span> <span class="mi">3</span><span class="o">;</span>
<span class="kt">int</span> <span class="n">q</span> <span class="o">=</span> <span class="n">lo</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">*</span> <span class="o">(</span><span class="n">hi</span> <span class="o">-</span> <span class="n">lo</span><span class="o">)</span> <span class="o">/</span> <span class="mi">3</span><span class="o">;</span>
<span class="k">if</span> <span class="o">(</span><span class="n">k</span> <span class="o"><</span> <span class="n">a</span><span class="o">[</span><span class="n">p</span><span class="o">])</span> <span class="o">{</span>
<span class="k">return</span> <span class="nf">dualPivotBinarysearch</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">k</span><span class="o">,</span> <span class="n">lo</span><span class="o">,</span> <span class="n">p</span><span class="o">);</span>
<span class="o">}</span> <span class="k">else</span> <span class="k">if</span> <span class="o">(</span><span class="n">k</span> <span class="o">></span> <span class="n">a</span><span class="o">[</span><span class="n">p</span><span class="o">]</span> <span class="o">&&</span> <span class="n">k</span> <span class="o"><</span> <span class="n">a</span><span class="o">[</span><span class="n">q</span><span class="o">])</span> <span class="o">{</span>
<span class="k">return</span> <span class="nf">dualPivotBinarysearch</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">k</span><span class="o">,</span> <span class="n">p</span> <span class="o">+</span> <span class="mi">1</span><span class="o">,</span> <span class="n">q</span><span class="o">);</span>
<span class="o">}</span> <span class="k">else</span> <span class="k">if</span> <span class="o">(</span><span class="n">k</span> <span class="o">></span> <span class="n">a</span><span class="o">[</span><span class="n">q</span><span class="o">])</span> <span class="o">{</span>
<span class="k">return</span> <span class="nf">dualPivotBinarysearch</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">k</span><span class="o">,</span> <span class="n">q</span> <span class="o">+</span> <span class="mi">1</span><span class="o">,</span> <span class="n">hi</span><span class="o">);</span>
<span class="o">}</span>
<span class="k">return</span> <span class="o">(</span><span class="n">k</span> <span class="o">==</span> <span class="n">a</span><span class="o">[</span><span class="n">p</span><span class="o">])</span> <span class="o">?</span> <span class="n">p</span> <span class="o">:</span> <span class="n">q</span><span class="o">;</span>
<span class="o">}</span></code></pre></figure>
<p>It should be clear now that a recurrent relation for a <em>dual-pivot</em> binary search looks as follows.</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">T</span><span class="o">(</span><span class="n">n</span><span class="o">)</span> <span class="o">=</span> <span class="n">T</span><span class="o">(</span><span class="n">n</span><span class="o">/</span><span class="mi">3</span><span class="o">)</span> <span class="o">+</span> <span class="n">O</span><span class="o">(</span><span class="mi">1</span><span class="o">)</span></code></pre></figure>
<p>The only difference is a data <em>split factor</em>, which is <code class="highlighter-rouge">3</code> against <code class="highlighter-rouge">2</code> in the original relation. Thus, three intervals give us a new time complexity: <code class="highlighter-rouge">O(log_3 n)</code>. The careful reader may notice, that a logarithm’s base is a constant factor, which is redundant and might be eliminated according to the Big-Oh definition rules. So, both algorithm have the same time bounds - <code class="highlighter-rouge">O(log n)</code>. And it’s doesn’t really matter what base of <code class="highlighter-rouge">log</code> is.</p>
<p>That is a partially true. We usually don’t care about constant factors in asymptotic bounds, since it doesn’t affect the algorithm’s scalability. The only thing we care is whether the algorithm is able to process the bigger (much bigger) input in a reasonable time or not. But, when it comes to a deeper analysis of the particular algorithm implementation, it may be useful keeping in mind the constant factors as well.</p>
<p>The new time complexity gives us a shorter version of a recursive tree: <code class="highlighter-rouge">log_3 n</code> against <code class="highlighter-rouge">log_2 n</code>. In other words, it gives us a shorter stack trace as well as a smaller memory footprint (due to reduced number of the allocated stack frames). For example, for n = 2 147 483 647 (a maximum length of the array that might be allocated in JVM) we’ll have a 40% shorter recursive tree. Isn’t it awesome? Not really. To be honest, 40% is from difference between 31 and 19 (base <code class="highlighter-rouge">2</code> against base <code class="highlighter-rouge">3</code> in a logarithmic function). A logarithm is an awesome function! It takes a number and makes it smaller. I wish all the algorithms have had a logarithm in their asymptotic bounds.</p>
<p>Well, what does it cost to make 31 recursive calls on a modern JVM (and a modern CPU as well)? I bet - <em>nothing</em>. And this might be a strong reason why we didn’t study a dual-pivot binary search algorithm in a university course. Another reason is optimizing compilers (i.e., a <a href="http://en.wikipedia.org/wiki/HotSpot">HotSpot JIT Compiler</a>) that can easily eliminate a <a href="http://en.wikipedia.org/wiki/Tail_call">tail-recursion</a> by replacing it with a simple iterative loop. Therefore, all the fictional benefits of using a dual-pivot binary search might be completely lost.</p>
<p>Anyway, there is still an interesting part of a dual-pivot approach that wasn’t discussed yet - a <em>number of comparisons</em>. Using a dual-pivot element introduces a different number of comparisons per recursive call: <code class="highlighter-rouge">4</code> (against <code class="highlighter-rouge">2</code> in a classic scheme), which doesn’t sound optimistic, but still should be investigated. And the easiest way to check whether it’s worth to use a dual-pivot scheme or not is to look at graphical representation of both functions: <code class="highlighter-rouge">2 log_2 n</code> and <code class="highlighter-rouge">4 log_3 n</code>.</p>
<p><img src="http://kostyukov.net/assets/images/chart.png" alt="A chart" /></p>
<p>The chart above shows that a dual-pivot scheme uses a bit more comparisons then a single-pivot one on the same input. More precisely, it uses 20% more comparisons then an original algorithm. On the one hand, a dual-pivot approach introduces a shorter recursive tree (less number of recursive calls), but on the other hand - a higher number of comparisons.</p>
<p>Let me do some math in order to find a reasonable answer whether (and when) it’s worth to use a dual-pivot binary search algorithm or not. A couple of new variables should be introduces: <code class="highlighter-rouge">p</code> - latency of a recursive (or simple) call, <code class="highlighter-rouge">q</code> - latency of a compare operation (i.e., an integer compare). Now we can define the <em>total</em> running time of a binary search algorithm as follows.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">t<span class="o">(</span>binary search<span class="o">)</span> <span class="o">=</span> p <span class="k">*</span> log_2 n + 2q <span class="k">*</span> log_2 n </code></pre></figure>
<p>It just straightforward: we waste <code class="highlighter-rouge">p * log_2 n</code> time doing recursive calls plus <code class="highlighter-rouge">2q * log_2 n</code> doing comparisons. The similar formula might be defined for a dual-pivot binary search algorithm.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">t<span class="o">(</span>dual-pivot binary search<span class="o">)</span> <span class="o">=</span> p <span class="k">*</span> log_3 n + 4q <span class="k">*</span> log_3 n </code></pre></figure>
<p>And we want to find a relation between <code class="highlighter-rouge">q</code> and <code class="highlighter-rouge">p</code> for which the following will be true (we’re looking for constraints with which a dual-pivot scheme takes less time).</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">t<span class="o">(</span>binary search<span class="o">)</span> <span class="o">></span> t<span class="o">(</span>dual-pivot binary search<span class="o">)</span>
.. or ..
<span class="o">(</span>p <span class="k">*</span> log_2 n<span class="o">)</span> + <span class="o">(</span>2q <span class="k">*</span> log_2 n<span class="o">)</span> <span class="o">></span> <span class="o">(</span>p <span class="k">*</span> log_3 n<span class="o">)</span> + <span class="o">(</span>4q <span class="k">*</span> log_3 n<span class="o">)</span></code></pre></figure>
<p>This gives us a strict answer: <code class="highlighter-rouge">p > 1.5 q</code>. In other words, it does make sense to use a dual-pivot approach on a platform for which making a function call costs at least as 1.5x of compare operation.</p>
<p>That’s nice to know, but can we find a more concrete answer? Well, it’s not that easy. It really depends on a hardware platform (ISA, micro-architecture) as well as on a software platform (compiler, runtime). Consider we use a compiler without tail-call optimization on a modern Intel’s CPU (like Haswell). An Agner’s <a href="http://www.agner.org/optimize/instruction_tables.pdf">optimization manual</a> says that it takes 1 or 2 clock-ticks for both <code class="highlighter-rouge">CMP</code> and <code class="highlighter-rouge">FUCOMI</code>/<code class="highlighter-rouge">FUCOMIP</code> instructions. Needless to say that <code class="highlighter-rouge">JMP</code>, <code class="highlighter-rouge">SUB</code> and <code class="highlighter-rouge">MOV</code> cost almost nothing: ~3-4 clock-ticks in total. Why do we need these three instructions? Well, a usual <em>calling convention</em> does</p>
<ul>
<li>perform a jump to a function - <code class="highlighter-rouge">JMP</code></li>
<li>save the current stack pointer - <code class="highlighter-rouge">MOV</code></li>
<li>reserve a stack for locals - <code class="highlighter-rouge">SUB</code></li>
</ul>
<p>Well, it’s roughly takes 3-4 clock-ticks in order to make a function call on a modern x86 chip. And this almost what we was looking for. We can say that it might be a good idea of using a dual-pivot binary search instead of a classic one. But the benefits we get in this case are such imperceptible that we won’t even see the difference. The only micro-benchmarking will help us find a truth.</p>
<p>The full source code of a <a href="http://openjdk.java.net/projects/code-tools/jmh/">JMH</a>-based benchmark is available <a href="https://gist.github.com/vkostyukov/6201007">at GitHub</a>. The results look as follows.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">Benchmark Mode Samples Mean Mean error Units
d.DPBS.benchmarkBS avgt 5 81.665 8.000 ns/op
d.DPBS.benchmarkDPBS avgt 5 69.563 8.410 ns/op</code></pre></figure>
<p>These performance results (70 nanoseconds vs. 80 nanoseconds on a hugest array that I managed to allocate on my MacBook Pro) sums up in a very robust conclusion: a classic binary search algorithm’s fast as hell. Seriously, it’s one of the fastest algorithms around. Just think about we wasted 80 nanoseconds (read it again - <em>nanoseconds</em>) in order to search in 2Gb array. That’s crazy fast and the difference in 10ns (read it again - <em>nanoseconds</em>) is just sort of quantum side-effects. So, if you have a bunch of ordered numbers and you want to perform a search on them - relax and use <code class="highlighter-rouge">Arrays.binarySearch()</code> or even <a href="http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10-percent/">write your own implementation</a> for a <a href="https://github.com/vkostyukov/la4j/blob/master/src/main/java/org/la4j/matrix/sparse/CRSMatrix.java#L407">particular case</a>.</p>
<p>The point is we don’t need a dual-pivot approach, since it gives you almost nothing on a modern platforms. The aim of this post is to find a reasonable answer on a question why there’s still no dual-pivot binary search around. I didn’t want to get a <em>faster</em> version of a original binary search, which surely can be done by rewriting a tail-recursion with iteration (but it’s not even necessary - just think of 31 recursive calls in a wost case). I just wanted to show how use complexity analysis along with math and knowledge about your platform in order to dig into the interesting question and have fun.</p>Vladimir KostyukovIn 2009, Vladimir Yaroslavski introduced a Dual-Pivot QuickSort algorithm, which is currently the default sorting algorithm for primitive types in Java 8. The idea behind this algorithm is both simple and awesome. Instead of using single pivot element, it uses two pivots that divide an input array into three intervals (against two intervals in original QuickSort). This allowed to decrease the height of recursion tree as well as reduce the number of comparisons. The post describes a similar dual-pivot approach but for a BinarySearch algorithm. Thus, our modified binary search algorithm has prefix Dual-Pivot.Finagle Your Fibonacci Calculation2014-02-01T10:00:00+00:002014-02-01T10:00:00+00:00http://kostyukov.net/posts/finagle-your-fibonacci-calculation<p><a href="http://twitter.github.io/finagle/">Finagle</a> is an RPC library for JVM that allows you to develop service-based applications in a protocol-agnostic way. Formally, the Finagle library provides both asynchronous runtime via <a href="http://twitter.github.io/finagle/guide/Futures.html">futures</a> and protocol-independence via <a href="http://twitter.github.io/finagle/guide/ServerAnatomy.html">codecs</a>. In this post I will try to build a Finagle-powered distributed <a href="http://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci Numbers</a> calculator that scales up to thousands of nodes.</p>
<h3 id="topology-design">Topology Design</h3>
<p>Lets start with the requirements. At the first glance, we might want to see our system both <a href="http://en.wikipedia.org/wiki/Fault_tolerance">fault-tolerant</a> and <a href="http://en.wikipedia.org/wiki/Scalability">scalable</a>. These are typical requirements for any kind of distributed system. And the good news here, Finagle provides a corresponding set of building blocks and mechanisms (such as load balancing, retrying, monitoring, etc.) that allows the developer easily write a <em>reusable</em> scalable and fault-tolerant code without a particulate knowledge about concrete protocols.</p>
<p>Anyway, such things like scalability should be done at some different level then framework or library level. The systems should be scalable by design not by any fancy tool. Thus, we must remember it at any stage of application life-cycle.</p>
<p>In order to design the scalable system we have to understand the problem we’re trying to solve. The classic Fibonacci calculation algorithm builds a recursive tree with height <code class="highlighter-rouge">O(n)</code> and branching factor <code class="highlighter-rouge">2</code>. Thus, the most natural and suitable <a href="http://www.openp2p.com/pub/a/p2p/2002/01/08/p2p_topologies_pt2.html">service topology</a> we can use here is a hierarchical one. The hierarchical or tree-based topology satisfy both scalability and fault-tolerance requirements. So, the distributed Fibonacci calculator might be viewed as following</p>
<p><img src="http://kostyukov.net/assets/images/fibonacci-design.png" alt="High-Level Design" /></p>
<p>In other words, we simply maps every node from the recursive tree (the algorithm’s abstraction) to physical/distributed nodes. The proposed topology tree has two kind of nodes - <em>leaf</em> nodes with label <code class="highlighter-rouge">W</code> (workers) and <em>branch</em> nodes with label <code class="highlighter-rouge">F</code> (<a href="http://en.wikipedia.org/wiki/Fan-out">fanouts</a>). The worker node is our workhorse that does all the magic, while the fanout node doesn’t really perform calculation but implements <em>map-reduce</em> approach by delegating the sub-problems to child nodes. The number of nodes in such tree is unlimited, but doesn’t really make sense having workers more than a number of logical cores in your CPU. For example, a suitable configuration for a typical <a href="http://en.wikipedia.org/wiki/Haswell_(microarchitecture)">Haswell</a> laptop with four logical cores looks exactly like the picture above.</p>
<h3 id="finagle-power">Finagle Power</h3>
<p>The Finagle’s API provides three robust building blocks: <a href="http://twitter.github.io/finagle/guide/Futures.html">futures</a>, <a href="http://twitter.github.io/finagle/guide/ServicesAndFilters.html">filters and services</a>. All the building blocks are designed to be composable in a very neat way. Thus, keeping in mind that futures are single-element <em>immutable containers</em> while services and filters are <em>just functions</em>, it’s really simple to reason about Finagle-powered code.</p>
<p>Finagle is a <em>service-oriented</em> platform. So, all the interactions between servers and clients are built around services. Servers implements their behavior via services, while clients interacts with servers via services. Finally, service is just a function that takes type <code class="highlighter-rouge">A</code> and returns a future of type <code class="highlighter-rouge">B</code>.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">trait</span> <span class="nc">Service</span><span class="o">[</span><span class="kt">A</span>, <span class="kt">B</span><span class="o">]</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">apply</span><span class="o">(</span><span class="n">a</span><span class="k">:</span> <span class="kt">A</span><span class="o">)</span><span class="k">:</span> <span class="kt">Future</span><span class="o">[</span><span class="kt">B</span><span class="o">]</span>
<span class="o">}</span></code></pre></figure>
<p>The <code class="highlighter-rouge">Future</code> type represents a placeholder for a response being sent from server. Programming with futures is an asynchronous programming discipline that relies on transforming values rather than reasoning about sequence of events and callbacks.</p>
<p>The last but not least thing to discuss - Finagle’s filters, which are actually <a href="http://en.wikipedia.org/wiki/Decorator_pattern"><em>decorators</em></a> for services. Filters allow to change the behavior of services at running time as well as to change their types and get some benefits from Scala’s type checker at compile time.</p>
<h3 id="abstractions">Abstractions</h3>
<p>Lets start with a cornerstone abstraction - a Fibonacci calculator that takes a <code class="highlighter-rouge">BigInt</code> number of Fibonacci member and returns a future of its value. It also a good idea to predefine a useful <code class="highlighter-rouge">BigInt</code> values in the same trait.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">trait</span> <span class="nc">FibonacciCalculator</span> <span class="o">{</span>
<span class="k">val</span> <span class="nc">Zero</span> <span class="k">=</span> <span class="nc">BigInt</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span>
<span class="k">val</span> <span class="nc">One</span> <span class="k">=</span> <span class="nc">BigInt</span><span class="o">(</span><span class="mi">1</span><span class="o">)</span>
<span class="k">val</span> <span class="nc">Two</span> <span class="k">=</span> <span class="nc">BigInt</span><span class="o">(</span><span class="mi">2</span><span class="o">)</span>
<span class="k">def</span> <span class="n">calculate</span><span class="o">(</span><span class="n">n</span><span class="k">:</span> <span class="kt">BigInt</span><span class="o">)</span><span class="k">:</span> <span class="kt">Future</span><span class="o">[</span><span class="kt">BigInt</span><span class="o">]</span>
<span class="o">}</span></code></pre></figure>
<p>Now we can define a worker node implementation that uses a <a href="http://stackoverflow.com/questions/19045936/scalas-for-comprehension-with-futures">for comprehension</a> for future pipelining (<em>sequential composition</em>). The straightforward implementation looks exactly like the classic recursive algorithm.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">object</span> <span class="nc">LocalFibonacciCalculator</span> <span class="k">extends</span> <span class="nc">FibonacciCalculator</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">calculate</span><span class="o">(</span><span class="n">n</span><span class="k">:</span> <span class="kt">BigInt</span><span class="o">)</span><span class="k">:</span> <span class="kt">Future</span><span class="o">[</span><span class="kt">BigInt</span><span class="o">]</span> <span class="k">=</span>
<span class="k">if</span> <span class="o">(</span><span class="n">n</span><span class="o">.</span><span class="n">equals</span><span class="o">(</span><span class="nc">Zero</span><span class="o">)</span> <span class="o">||</span> <span class="n">n</span><span class="o">.</span><span class="n">equals</span><span class="o">(</span><span class="nc">One</span><span class="o">))</span> <span class="nc">Future</span><span class="o">.</span><span class="n">value</span><span class="o">(</span><span class="n">n</span><span class="o">)</span>
<span class="k">else</span> <span class="k">for</span> <span class="o">{</span> <span class="n">a</span> <span class="k"><-</span> <span class="n">calculate</span><span class="o">(</span><span class="n">n</span> <span class="o">-</span> <span class="nc">One</span><span class="o">)</span>
<span class="n">b</span> <span class="k"><-</span> <span class="n">calculate</span><span class="o">(</span><span class="n">n</span> <span class="o">-</span> <span class="nc">Two</span><span class="o">)</span> <span class="o">}</span> <span class="k">yield</span> <span class="o">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">)</span>
<span class="o">}</span></code></pre></figure>
<p>Thus, the fanout node implementation might be defined as following</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">FanoutFibonacciCalculator</span><span class="o">(</span>
<span class="n">left</span><span class="k">:</span> <span class="kt">FibonacciCalculator</span><span class="o">,</span>
<span class="n">right</span><span class="k">:</span> <span class="kt">FibonacciCalculator</span><span class="o">)</span> <span class="k">extends</span> <span class="nc">FibonacciCalculator</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">calculate</span><span class="o">(</span><span class="n">n</span><span class="k">:</span> <span class="kt">BigInt</span><span class="o">)</span><span class="k">:</span> <span class="kt">Future</span><span class="o">[</span><span class="kt">BigInt</span><span class="o">]</span> <span class="k">=</span>
<span class="k">if</span> <span class="o">(</span><span class="n">n</span><span class="o">.</span><span class="n">equals</span><span class="o">(</span><span class="nc">Zero</span><span class="o">)</span> <span class="o">||</span> <span class="n">n</span><span class="o">.</span><span class="n">equals</span><span class="o">(</span><span class="nc">One</span><span class="o">))</span> <span class="nc">Future</span><span class="o">.</span><span class="n">value</span><span class="o">(</span><span class="n">n</span><span class="o">)</span>
<span class="k">else</span> <span class="o">{</span>
<span class="k">val</span> <span class="n">seq</span> <span class="k">=</span> <span class="nc">Seq</span><span class="o">(</span><span class="n">left</span><span class="o">.</span><span class="n">calculate</span><span class="o">(</span><span class="n">n</span> <span class="o">-</span> <span class="nc">One</span><span class="o">),</span> <span class="n">right</span><span class="o">.</span><span class="n">calculate</span><span class="o">(</span><span class="n">n</span> <span class="o">-</span> <span class="nc">Two</span><span class="o">))</span>
<span class="nc">Future</span><span class="o">.</span><span class="n">collect</span><span class="o">(</span><span class="n">seq</span><span class="o">)</span> <span class="n">map</span> <span class="o">{</span> <span class="k">_</span><span class="o">.</span><span class="n">sum</span> <span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>The fanout calculator uses a <em>concurrent compositor</em> <code class="highlighter-rouge">Future.collect()</code> (which takes the sequence of futures and returns the future of sequences) in order to process left and right sub-trees in parallel. The last future transformation that is performed by fanout calculator is summing up the sequence.</p>
<p>In our system, we will use the String-based transport layer provided by Finagle’s example of <a href="http://twitter.github.io/finagle/guide/ServerAnatomy.html">Echo Server</a>, which means we need to provide a suitable <em>adapter</em> implementation that adapts the String-based service <code class="highlighter-rouge">Service[String, String]</code> to <code class="highlighter-rouge">FibonacciCalculator</code> interface. This will allow us to use remote workers as fanout node’s children.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">RemoteFibonacciCalculator</span><span class="o">(</span><span class="n">remote</span><span class="k">:</span> <span class="kt">Service</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">])</span>
<span class="k">extends</span> <span class="nc">FibonacciCalculator</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">calculate</span><span class="o">(</span><span class="n">n</span><span class="k">:</span> <span class="kt">BigInt</span><span class="o">)</span><span class="k">:</span> <span class="kt">Future</span><span class="o">[</span><span class="kt">BigInt</span><span class="o">]</span> <span class="k">=</span>
<span class="n">remote</span><span class="o">(</span><span class="n">n</span><span class="o">.</span><span class="n">toString</span><span class="o">)</span> <span class="n">map</span> <span class="o">{</span> <span class="nc">BigInt</span><span class="o">(</span><span class="k">_</span><span class="o">)</span> <span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>Good news here is that <code class="highlighter-rouge">BigInt</code> can be converted to the <code class="highlighter-rouge">String</code> (and vice versa) out-of-the box, so we can easily perform the conversion in one line.</p>
<p>Now we’re ready to setup our service that takes a Fibonacci calculator and delegates the clients’ requests to it. Also, a bit of type conversions should be done here. The <code class="highlighter-rouge">FibonacciService</code> can be treated as an <em>adapter</em> of <code class="highlighter-rouge">FibonacciCalculator</code> to <code class="highlighter-rouge">Service</code> interface.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">FibonacciService</span><span class="o">(</span><span class="n">calculator</span><span class="k">:</span> <span class="kt">FibonacciCalculator</span><span class="o">)</span>
<span class="k">extends</span> <span class="nc">Service</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">]</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">apply</span><span class="o">(</span><span class="n">req</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span><span class="k">:</span> <span class="kt">Future</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span> <span class="k">=</span>
<span class="n">calculator</span><span class="o">.</span><span class="n">calculate</span><span class="o">(</span><span class="nc">BigInt</span><span class="o">(</span><span class="n">req</span><span class="o">))</span> <span class="n">map</span> <span class="o">{</span> <span class="k">_</span><span class="o">.</span><span class="n">toString</span> <span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<h3 id="server-and-client-configurations">Server and Client Configurations</h3>
<p>Finally, we can define a server that handles our Fibonacci service. The launcher should allow the user to run either the worker node or fanout node by specifying the corresponding command line options. The complete implementation looks like following.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">object</span> <span class="nc">FibonacciServerLauncher</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">main</span><span class="o">(</span><span class="n">args</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">String</span><span class="o">])</span><span class="k">:</span> <span class="kt">Unit</span> <span class="o">=</span> <span class="n">main</span><span class="o">(</span><span class="n">args</span><span class="o">.</span><span class="n">toSeq</span><span class="o">)</span>
<span class="k">def</span> <span class="n">main</span><span class="o">(</span><span class="n">args</span><span class="k">:</span> <span class="kt">Seq</span><span class="o">[</span><span class="kt">String</span><span class="o">])</span><span class="k">:</span> <span class="kt">Unit</span> <span class="o">=</span> <span class="n">args</span> <span class="k">match</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">Seq</span><span class="o">(</span><span class="s">"leaf"</span><span class="o">,</span> <span class="n">port</span><span class="o">)</span> <span class="k">=></span>
<span class="k">val</span> <span class="n">service</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">FibonacciService</span><span class="o">(</span><span class="nc">LocalFibonacciCalculator</span><span class="o">)</span>
<span class="nc">Await</span><span class="o">.</span><span class="n">ready</span><span class="o">(</span><span class="nc">FibonacciServer</span><span class="o">.</span><span class="n">serve</span><span class="o">(</span><span class="s">":"</span> <span class="o">+</span> <span class="n">port</span><span class="o">,</span> <span class="n">service</span><span class="o">))</span>
<span class="k">case</span> <span class="nc">Seq</span><span class="o">(</span><span class="s">"node"</span><span class="o">,</span> <span class="n">port</span><span class="o">,</span> <span class="n">left</span><span class="o">,</span> <span class="n">right</span><span class="o">)</span> <span class="k">=></span>
<span class="c1">// remote services
</span> <span class="k">val</span> <span class="n">ls</span> <span class="k">=</span> <span class="nc">FibonacciClient</span><span class="o">.</span><span class="n">newService</span><span class="o">(</span><span class="s">"localhost:"</span> <span class="o">+</span> <span class="n">left</span><span class="o">)</span>
<span class="k">val</span> <span class="n">rs</span> <span class="k">=</span> <span class="nc">FibonacciClient</span><span class="o">.</span><span class="n">newService</span><span class="o">(</span><span class="s">"localhost:"</span> <span class="o">+</span> <span class="n">right</span><span class="o">)</span>
<span class="c1">// remote calculators
</span> <span class="k">val</span> <span class="n">lc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">RemoteFibonacciCalculator</span><span class="o">(</span><span class="n">ls</span><span class="o">)</span>
<span class="k">val</span> <span class="n">rc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">RemoteFibonacciCalculator</span><span class="o">(</span><span class="n">rs</span><span class="o">)</span>
<span class="c1">// a fanout
</span> <span class="k">val</span> <span class="n">service</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">FibonacciService</span><span class="o">(</span><span class="k">new</span> <span class="nc">FanoutFibonacciCalculator</span><span class="o">(</span><span class="n">lc</span><span class="o">,</span> <span class="n">rc</span><span class="o">))</span>
<span class="nc">Await</span><span class="o">.</span><span class="n">ready</span><span class="o">(</span><span class="nc">FibonacciServer</span><span class="o">.</span><span class="n">serve</span><span class="o">(</span><span class="s">":"</span> <span class="o">+</span> <span class="n">port</span><span class="o">,</span> <span class="n">service</span><span class="o">))</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>The client launcher looks much simpler though.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">object</span> <span class="nc">FibonacciClientLauncher</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">main</span><span class="o">(</span><span class="n">args</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">String</span><span class="o">])</span><span class="k">:</span> <span class="kt">Unit</span> <span class="o">=</span> <span class="n">main</span><span class="o">(</span><span class="n">args</span><span class="o">.</span><span class="n">toSeq</span><span class="o">)</span>
<span class="k">def</span> <span class="n">main</span><span class="o">(</span><span class="n">args</span><span class="k">:</span> <span class="kt">Seq</span><span class="o">[</span><span class="kt">String</span><span class="o">])</span><span class="k">:</span> <span class="kt">Unit</span> <span class="o">=</span> <span class="n">args</span> <span class="k">match</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">Seq</span><span class="o">(</span><span class="n">port</span><span class="o">,</span> <span class="n">req</span><span class="o">)</span> <span class="k">=></span>
<span class="k">val</span> <span class="n">client</span> <span class="k">=</span> <span class="nc">FibonacciClient</span><span class="o">.</span><span class="n">newService</span><span class="o">(</span><span class="s">"localhost:"</span> <span class="o">+</span> <span class="n">port</span><span class="o">)</span>
<span class="k">val</span> <span class="n">rep</span> <span class="k">=</span> <span class="nc">Await</span><span class="o">.</span><span class="n">result</span><span class="o">(</span><span class="n">client</span><span class="o">(</span><span class="n">req</span><span class="o">))</span>
<span class="n">printf</span><span class="o">(</span><span class="s">"Fibonacci(%s) is %s\n"</span><span class="o">,</span> <span class="n">req</span><span class="o">,</span> <span class="n">rep</span><span class="o">)</span>
<span class="k">case</span> <span class="k">_</span> <span class="k">=></span> <span class="n">println</span><span class="o">(</span><span class="s">"Bad arguments!"</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>The complete source code of both client and server is available <a href="https://github.com/vkostyukov/finagle-fibonacci">at GitHub</a>.</p>
<p>Now, it’s time to build the topology from the first picture (the binary three with seven nodes). The following script builds the tree in a <em>bottom-up</em> manner by launching a seven instances on the same machine.</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>sbt <span class="s2">"run-main FibonacciServerLauncher leaf 2001"</span> <span class="o">&&</span> <span class="se">\</span>
sbt <span class="s2">"run-main FibonacciServerLauncher leaf 2002"</span> <span class="o">&&</span> <span class="se">\</span>
sbt <span class="s2">"run-main FibonacciServerLauncher node 2003 2002 2001"</span> <span class="o">&&</span> <span class="se">\</span>
sbt <span class="s2">"run-main FibonacciServerLauncher leaf 2004"</span> <span class="o">&&</span> <span class="se">\</span>
sbt <span class="s2">"run-main FibonacciServerLauncher leaf 2005"</span> <span class="o">&&</span> <span class="se">\</span>
sbt <span class="s2">"run-main FibonacciServerLauncher node 2006 2005 2004"</span> <span class="o">&&</span> <span class="se">\</span>
sbt <span class="s2">"run-main FibonacciServerLauncher node 2007 2006 2003"</span></code></pre></figure>
<p>From the client-side, system usage looks pretty simple. The client should interact with a root node of the topology tree. In our case, with an instance on port <code class="highlighter-rouge">2007</code>.</p>
<p><img src="http://kostyukov.net/assets/images/fibonacci-usage.png" alt="System Usage" /></p>
<h3 id="filters-as-services-decorators">Filters as Services’ Decorators</h3>
<p>Filters provide a natural and clean way of changing the services’ behavior by chaining their requests through the number of nested filters. Thus, the same <em>protocol-independent</em> filters can be used at both server and client sides.</p>
<p>Lets consider the example of the filter that simply logs services’ requests to the console.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">object</span> <span class="nc">LogStringFilter</span> <span class="k">extends</span> <span class="nc">Filter</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">String</span>, <span class="kt">String</span>, <span class="kt">String</span><span class="o">]</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">apply</span><span class="o">(</span><span class="n">req</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">srv</span><span class="k">:</span> <span class="kt">Service</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">])</span><span class="k">:</span> <span class="kt">Future</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
<span class="n">println</span><span class="o">(</span><span class="s">"Got a request: "</span> <span class="o">+</span> <span class="n">req</span><span class="o">)</span>
<span class="n">srv</span><span class="o">(</span><span class="n">req</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>The filter can be applied to the service by <code class="highlighter-rouge">andThen</code> operator. In order to make workers dump their requests we can change the launcher configuration as following.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">service</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">FibonacciService</span><span class="o">(</span><span class="nc">LocalFibonacciCalculator</span><span class="o">)</span>
<span class="nc">Await</span><span class="o">.</span><span class="n">ready</span><span class="o">(</span><span class="nc">FibonacciServer</span><span class="o">.</span><span class="n">serve</span><span class="o">(</span><span class="s">":"</span> <span class="o">+</span> <span class="n">port</span><span class="o">,</span> <span class="nc">LogStringFilter</span> <span class="n">andThen</span> <span class="n">service</span><span class="o">))</span></code></pre></figure>
<h3 id="is-it-scalable-and-fault-tolerant">Is it Scalable and Fault-Tolerant?</h3>
<p>The suggested tree-based topology might be scaled in a <em>bottom-up</em> manner by adding new levels of fanout nodes. But, it’s not that easy to configure the system with only shell commands described before. Any kind of specialized tools (like <a href="http://zookeeper.apache.org">ZooKeeper</a>, which is supported by Finagle) should be used instead.</p>
<p>In order to make the system fault-tolerant, we can use Finagle’s built-in <em>load balancers</em> as well as customized filters that implement <a href="https://github.com/twitter/finagle/blob/master/finagle-core/src/main/scala/com/twitter/finagle/service/RetryingFilter.scala">retries</a> and <a href="https://github.com/twitter/finagle/blob/master/finagle-core/src/main/scala/com/twitter/finagle/service/TimeoutFilter.scala">timeouts</a>.</p>
<p>For example, the following client service will be balancing its requests between two nodes <code class="highlighter-rouge">localhost:2001</code> and <code class="highlighter-rouge">localhost:2002</code>:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">client</span> <span class="k">=</span> <span class="nc">FibonacciClient</span><span class="o">.</span><span class="n">newService</span><span class="o">(</span><span class="s">"localhost:2001,localhost:2002"</span><span class="o">)</span></code></pre></figure>
<h3 id="further-improvements">Further Improvements</h3>
<p>It might be a good idea to replace the String-based transport layer with BigInt-based one. The suitable example of corresponding pipeline configurations with BigInt decoders and encoders can be found <a href="https://github.com/netty/netty/tree/master/example/src/main/java/io/netty/example/factorial">at Netty’s example directory</a>.</p>Vladimir KostyukovFinagle is an RPC library for JVM that allows you to develop service-based applications in a protocol-agnostic way. Formally, the Finagle library provides both asynchronous runtime via futures and protocol-independence via codecs. In this post I will try to build a Finagle-powered distributed Fibonacci Numbers calculator that scales up to thousands of nodes.In The Beginning2014-01-19T10:00:00+00:002014-01-19T10:00:00+00:00http://kostyukov.net/posts/in-the-beginning<p>This is my first post on this blog. Needless to say, I’m really excited about this. This is my first attempt in writing blog posts in English. I was previously posting in Russian at <a href="http://vkostyukov.livejournal.com">LJ</a> and <a href="http://vkostyukov.tumblr.com">Tumbler</a>, but it wasn’t about technical things. Here, I will try to post only about CS. I have huge plans on posting about Scala and it’s application for research of purely functional data structures. I should probably post about algorithms and data structures that I’ve already implemented in <a href="https://github.com/vkostyukov/scalacaster">Scalacaster</a>. There are loads of awesome pieces of Scala code that I want to tell about. One of my favorites - <a href="https://github.com/vkostyukov/scalacaster/blob/master/src/search/SelectionSearch.scala">QuickSelect</a> in a purely functional setting.</p>
<p>Anyway, I’m really looking forward to writing here. And I’m currently in the middle of writing my first useful post here. I’m going to describe a new and purely functional implementation technique of <a href="http://en.wikipedia.org/wiki/Disjoint-set_data_structure">Union-Find</a> data structure. I’ve almost committed the implementation, but it’s still requires a bit of improvement.</p>
<p>In order to not miss the updates of this blog, I would recommend you to follow my Twitter <a href="https://twitter.com/vkostyukov">@vkostyukov</a>. Announcements will be posted there.</p>Vladimir KostyukovThis is my first post on this blog. Needless to say, I’m really excited about this. This is my first attempt in writing blog posts in English. I was previously posting in Russian at LJ and Tumbler, but it wasn’t about technical things. Here, I will try to post only about CS. I have huge plans on posting about Scala and it’s application for research of purely functional data structures. I should probably post about algorithms and data structures that I’ve already implemented in Scalacaster. There are loads of awesome pieces of Scala code that I want to tell about. One of my favorites - QuickSelect in a purely functional setting.