Delving the depths of computing,
hoping not to get eaten by a wumpus

By Timm Murray

Nginx direct cachefile hosting, or I found a hammer, get me a nail

2018-02-01


Let’s say you had an API that served JSON. Some of the responses don’t change very often. Perhaps something like user data:

{
  "uid": 12345,
  "addresses": [
    {
      "state": "SC",
      "city": "Mayberry",
      "street": "1234 Lalaberry Lane",
      "zip": 12345
    }
  ],
  "lname": "Smith",
  "fname": "John"
}

It’s an obvious case where caching could help. Perhaps you stick the data in memcached and write it out directly from your app.

This means you’re still hitting application code. Wouldn’t it be nice if you could have nginx write the cached data back to the client as if it were a static file? This is possible using a ramdisk and nginx’s try_files.

Start with this little Mojolicious app:

#!perl
use v5.20;
use warnings;
use Mojolicious::Lite;
use File::Spec::Functions 'catfile';
use Cpanel::JSON::XS 'encode_json';

use constant CACHE_FILE_PATH => 'html';
use constant CACHE_DATASTRUCTURE => {
    uid => 12345,
    fname => 'John',
    lname => 'Smith',
    addresses => [{
        street => '1234 Lalaberry Lane',
        city => 'Mayberry',
        state => 'SC',
        zip => 12345,
    }],
};
use constant CACHE_JSON => encode_json( CACHE_DATASTRUCTURE );

get '/ramdisk/*' => sub {
    my ($c) = @_;

    sleep 5;

    my $url_path = $c->req->url->path;

    my $path = catfile( CACHE_FILE_PATH, $url_path );
    # TODO SECURITY ensure $path is actually under the absolute path to 
    # CACHE_FILE_PATH, cleaning up any '..' or other path miscreants
    open( my $out, '>', $path )
        or die "Can't write to $path: $!\n";
    print $out CACHE_JSON;
    close $out;

    $c->render(
        data => CACHE_JSON,
        format => 'json',
    );
};

get '/direct/*' => sub {
    my ($c) = @_;
    $c->render(
        data => CACHE_JSON,
        format => 'json',
    );
};


app->start;

This provides two paths to the same JSON. The first one, /ramdisk/*, will write the JSON to a path we specify under our nginx root. This has a deliberate sleep 5 call, which simulates the first request being very slow. The second, /direct/* is for benchmarking. It dumps some pre-encoded JSON back to the client, which gives us an upper limit on how fast we could go if we pulled that data out of memcached or something.

(If you use this code for anything, do note the security warning. The code as written here could allow an attacker to overwrite arbitrary files. You need to ensure the place you’re writing is underneath the subdirectory you expect. I didn’t want to clutter up the example too much with details, so this is left as an exercise to the reader.)

Save it as mojo.pl in a directory like this:

$ ls
html
mojo.pl

The html dir will be the place where nginx serves its static files. Create html/ramdisk and then mount a ramdisk there:

$ sudo mount -t tmpfs -o size=10M,mode=0777 tmpfs html/ramdisk/

This will give you a 10MB ramdisk writable by all users. When the mojo app above is called with /ramdisk/foo, it will write the JSON to this ramdisk and return it.

Now for the nginx config. Using try_files, we first check if the URI is directly available. If so, nginx will return it verbatim. If not, we have it proxy to our mojo app.

worker_processes  10;


events {
    worker_connections  1024;
}


http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile        on;
    tcp_nopush     on;

    keepalive_timeout  65;

    server {
        listen       8001;
        server_name  localhost;

        root html;

        location /ramdisk {
            default_type application/json;
            try_files $uri $uri/ @cache_build;
        }

        location @cache_build {
            proxy_pass http://localhost:8002;
        }
    }
}

Start this up and call http://localhost:8001/ramdisk/foo. If the file hadn’t been created yet, then that sleep from earlier will force it to take about 5 seconds to return a response. Once the file is created, the response should be nearly instant.

How “instant”? Very instant. Here’s the result from ab of calling this 100,000 times, with 100 concurrent requests (all on localhost):

Concurrency Level:      100
Time taken for tests:   4.629 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      37300000 bytes
HTML transferred:       13400000 bytes
Requests per second:    21604.02 [#/sec] (mean)
Time per request:       4.629 [ms] (mean)
Time per request:       0.046 [ms] (mean, across all concurrent requests)
Transfer rate:          7869.43 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    2   0.5      2       6
Processing:     1    3   0.6      3      10
Waiting:        0    2   0.6      2      10
Total:          2    5   0.6      5      13

Percentage of the requests served within a certain time (ms)
  50%      5
  66%      5
  75%      5
  80%      5
  90%      5
  95%      5
  98%      6
  99%      7
 100%     13 (longest request)

And the results from calling the mojo app with /direct/foo:

Concurrency Level:      100
Time taken for tests:   87.616 seconds
Complete requests:      100000
Failed requests:        0
Total transferred:      28500000 bytes
HTML transferred:       13400000 bytes
Requests per second:    1141.34 [#/sec] (mean)
Time per request:       87.616 [ms] (mean)
Time per request:       0.876 [ms] (mean, across all concurrent requests)
Transfer rate:          317.66 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0      17
Processing:     8   88  32.7     88     174
Waiting:        8   87  32.7     88     174
Total:          8   88  32.7     88     174

Percentage of the requests served within a certain time (ms)
  50%     88
  66%    101
  75%    111
  80%    117
  90%    132
  95%    142
  98%    152
  99%    157
 100%    174 (longest request)

We took 88ms down to just 5ms. This is on an Intel Core i7-6500 @ 2.5GHz.

If you’re wondering, I didn’t see any benefit to using sendfile or tcp_nopush in nginx. This may be because I’m doing everything over localhost.

What I like even more is that you don’t need any special tools to manipulate the cache. Unix provides everything you need. Want to see the contents? cat [file]. Want to clear a cache file? rm [file]. Want to set a local override? EDITOR-OF-CHOICE [file].

Now to go find a use for this.



Copyright © 2024 Timm Murray
CC BY-NC

Opinions expressed are solely my own and do not express the views or opinions of my employer.