Warp-speed Servers

The toughest part of writing high-performance servers is minimizing context switches. Traditionally, servers like Apache handled each request in a single thread, or even a single process. Under heavy load, there could be many threads in wildly different states. One might be waiting for a file write to complete; another might be waiting for data from a network connection; yet another would be merrily computing if only it were given a CPU. A generic scheduler has a hard time juggling all these threads, because it has no idea what each thread is up to.

Well, actually, schedulers are capable enough to know which threads are blocked on I/O for example, and such threads will be ignored until they are unblocked. Even so, among other woes, we still suffer from frequent trips from user-space to kernel-space.

The solution is to roll our own scheduler. If a thread wishes to write to a file, our scheduler kicks off an asynchronous write call and asks to be notified when it’s finished. In the meantime, it can give control to another thread, who will later cede control at a convenient moment. Knowing what each thread is doing means our scheduler knows the best points to switch between threads, and also knows exactly what must be preserved so we can later continue where we left off. Typically, it is good to switch when there is little to preserve. Thus we reduce context switches, and the context switches that do remain are cheaper.

Threads managed by this smarter scheduler are called green threads (or lightweight threads or user threads) to distinguish them from the native threads supported by the operating system.

This solution is also termed event-driven architecture: our homebrew scheduler can be viewed as a loop that serially processes a stream of events, where each event is something like "there is a user request coming in" or "I’ve finished writing to the file". (In contrast, a generic scheduler gets less helpful information like "I’m doing something".)

From brief browsing, my impression is we say "event-driven architecture" when the scheduler is built into the server monolithically, and "green threads" when split off into a library or as part of the compiler. For example, see this USENIX paper.

GHC threads

Whatever we call it, a good scheduler is tedious to implement from scratch. It is highly specific to the target system: for example, on Linux, one must know the epoll API, while on FreeBSD, one must know kqueue. One must learn numerous asynchronous APIs. Thankfully, for several languages, we can stand on the shoulders of giants.

How do we get green threads in Haskell? By doing nothing! The runtime system (RTS) of GHC includes an efficient tunable user-level thread scheduler and plain Haskell code automatically runs in green threads.

So we can have our cake and eat it. We can write Apache-style code that handles one request per thread and like the Warp library, count on GHC to minimize context switches to build high-performance servers.

A Minimal Web Server

For now, we’ll be even lazier and rely on the Warp library for almost everything. We obtain a fully functional web server with:

-- http://stackoverflow.com/questions/22620294/minimal-warp-webserver-example
{-# LANGUAGE OverloadedStrings #-}
import Network.Wai (responseLBS)
import Network.Wai.Handler.Warp (run)
import Network.HTTP.Types (status200)
import Network.HTTP.Types.Header (hContentType)

main = run 3000 $ \_ f -> f $
    responseLBS status200 [(hContentType, "text/plain")] "Hello, world!\n"

Embedded Static Content

With a little work we arrive at a webserver that:

  • At compile time, recursively embeds every file in the "www" subdirectory into the binary.

  • Serves the contents of these files when requested (relative to "www").

  • Implicitly appends "index.html" if the request path ends with "/".

  • Redirects unknown paths to "/".

  • Logs the remote host and requested path on standard output.

In other words, this code produces a single binary that can power an entire static web site. This binary never needs to open files, and contains no buffer overflow exploits or memory allocation bugs (assuming, of course, the underlying libraries and compiler are flawless).

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TemplateHaskell #-}
import Blaze.ByteString.Builder
import Data.ByteString.Char8 (pack)
import Data.FileEmbed
import Data.List (isSuffixOf)
import Data.Map (fromList, (!), member)
import Network.Wai
import Network.Wai.Handler.Warp (runEnv)
import Network.HTTP.Types
import Network.HTTP.Types.Header

m = let
  m0 = map (\(k, v) -> (('/':k), v)) $(embedDir "www")
  m1 = [(take (length k - 10) k , v)
    | (k, v) <- m0, "/index.html" `isSuffixOf` k]
  in fromList $ map (\(k, v) -> (pack k, v)) (m0 ++ m1)

main = do
  runEnv 3000 $ \req f ->
    case requestMethod req of
    "GET" -> do
      putStrLn $ show (remoteHost req) ++ " " ++ show (rawPathInfo req)
      let k = rawPathInfo req in f $
        if k `member` m then
          responseBuilder status200 [] $ fromByteString (m!k)
          responseBuilder status301 [(hLocation, "/")] $
            fromByteString "Redirect to /"

We could speed up this program by generating hard-wired request routing code rather than a Data.Map. Ideally, it should also send an appropriate Content-Type header based on file extensions.

Ben Lynn blynn@cs.stanford.edu 💡