pmuellr: 2013

Tuesday, November 12, 2013

gzip encoding and compress middleware

Hopefully everyone who builds stuff for the web knows about gzip compression available in HTTP. Here's a quick intro if you don't: https://developers.google.com/speed/articles/gzip

I use connect or express when building web servers in node, and you can use the compress middleware to have your content gzip'd (or deflate'd), like in this little snippet.

However ...

Let's think about what's going on here. When you use the compress middleware like this, for every response that sends compressable content, your content will be compressed. Every time. Of course, for your "static" resources, the result of that compression is the same every time, and so for those resources, it's really kind of pointless to run the compression for each request. You should do it once, and then reuse that compressed result for future requests.

Here are some tests using the play Waste, by Harley Granville-Barker. I pulled the HTML version of the file, and then also gzip'd the file manually from the command-line for one of tests.

The HTML file is ~300 KB. The gzip'd version is ~90 KB.

And here's a server I built to serve the files:

The server runs on 3 different HTTP ports, each one serving the file, but in a different way.

Port 4000 serves the HTML file with no compression.

Port 4001 serves the HTML file with the compress middleware

Port 4002 serves the the pre-gzip'd version of the file that I stored in a separate directory; the original file was waste.html, but the gzip'd version is in gz/waste.html. It checks the incoming request to see if a gzip'd version of the file exists (caching that result), and internally redirects the server to that file by resetting request.url. Setting the appropriate Content-Encoding, etc headers.

What a hack! Not quite sure that "fixing" request.url is kosher, but, worked great for this test.

Here's some curl invocations.

$ curl --compressed --output /dev/null --dump-header - 
   --write-out "%{size_download} bytes" http://localhost:4000/waste.html

X-Powered-By: Express
Accept-Ranges: bytes
ETag: "305826-1384296482000"
Date: Wed, 13 Nov 2013 01:21:13 GMT
Cache-Control: public, max-age=0
Last-Modified: Tue, 12 Nov 2013 22:48:02 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 305826
Connection: keep-alive

305826 bytes

Looks normal.

$ curl --compressed --output /dev/null --dump-header - 
   --write-out "%{size_download} bytes" http://localhost:4001/waste.html

X-Powered-By: Express
Accept-Ranges: bytes
ETag: "305826-1384296482000"
Date: Wed, 13 Nov 2013 01:21:13 GMT
Cache-Control: public, max-age=0
Last-Modified: Tue, 12 Nov 2013 22:48:02 GMT
Content-Type: text/html; charset=UTF-8
Vary: Accept-Encoding
Content-Encoding: gzip
Connection: keep-alive
Transfer-Encoding: chunked

91071 bytes

Nice seeing the Content-Encoding and Vary headers, along with the reduced download size. But look ma, no Content-Length header; instead the content comes down chunked, as you would expect with a server-processed output stream.

$ curl --compressed --output /dev/null --dump-header - 
   --write-out "%{size_download} bytes" http://localhost:4002/waste.html

X-Powered-By: Express
Content-Encoding: gzip
Vary: Accept-Encoding
Accept-Ranges: bytes
ETag: "90711-1384297654000"
Date: Wed, 13 Nov 2013 01:21:13 GMT
Cache-Control: public, max-age=0
Last-Modified: Tue, 12 Nov 2013 23:07:34 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 90711
Connection: keep-alive

90711 bytes

Like the gzip'd version above, but this one has a Content-Length!

Here are some contrived, useless benches using wrk, that confirm my fears.

$ wrk --connections 100 --duration 10s --threads 10 
   --header "Accept-Encoding: gzip" http://localhost:4000/waste.html

Running 10s test @ http://localhost:4000/waste.html
  10 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    71.72ms   15.64ms 101.74ms   69.82%
    Req/Sec   139.91     10.52   187.00     87.24%
  13810 requests in 10.00s, 3.95GB read
Requests/sec:   1380.67
Transfer/sec:    404.87MB

$ wrk --connections 100 --duration 10s --threads 10 
   --header "Accept-Encoding: gzip" http://localhost:4001/waste.html

Running 10s test @ http://localhost:4001/waste.html
  10 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   431.76ms   20.27ms 493.16ms   63.89%
    Req/Sec    22.47      3.80    30.00     80.00%
  2248 requests in 10.00s, 199.27MB read
Requests/sec:    224.70
Transfer/sec:     19.92MB

$ wrk --connections 100 --duration 10s --threads 10 
   --header "Accept-Encoding: gzip" http://localhost:4002/waste.html

Running 10s test @ http://localhost:4002/waste.html
  10 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    48.11ms   10.66ms  72.33ms   67.39%
    Req/Sec   209.46     24.30   264.00     81.07%
  20795 requests in 10.01s, 1.76GB read
Requests/sec:   2078.08
Transfer/sec:    180.47MB

Funny to note that the server using compress middleware actually handles less requests/sec than the one that doesn't compress at all. But this is a localhost test so the network bandwidth/throughput isn't realistic. Still, makes ya think.

Tuesday, October 08, 2013

sourcemap best practices

Source map support for web page debugging has been a thing now for a while. If you don't know what a sourcemap is, Ryan Seddon's article "Introduction to JavaScript Source Maps", provides a technical introduction. The "proposal" for source map seems to be here.

In a nutshell, sourcemap allow you to ship minized .js files (and maybe .css files), which point back to the original source, so that when you debug your code in a debugger like Chrome Dev Tools, you'll see the original source.

It's awesome.

Even better, we're starting to see folks ship source maps with their minized resources. For example, looking at the minized angular 1.2.0-rc.2 js file you can see the sourcemap annotation near the end:

/*
//@ sourceMappingURL=angular.min.js.map
*/

The contents of that .map file looks like this (I've truncated bits of it):

{
"version":3,
"file":"angular.min.js",
"lineCount":183,
"mappings":"A;;;;;aAKC,SAAQ,CAACA,CAAD,...",
"sources":["angular.js","MINERR_ASSET"],
"names":["window","document","undefined",...]
}

You don't need to understand any of this, but what it means is that if you use the angular.min.js file in a web page, and also make angular.min.js.map and angular.js files available, when you debug with a sourcemap-enabled debugger (like Chrome Dev Tools), you'll see the original source instead of the minized source.

But there are a few issues. Although I haven't written any tools to generate source maps, I do consume them in libraries and with tools that I use (like browserify). Sometimes I do a bit of surgery on them, to make them work a little better.

Armed with this experience, I figured I'd post a few "best practices", based on some of the issues I've seen, aimed at folks generating sourcemaps.

do not use a data-url with your sourceMappingURL annotation

Using a data url, which browserify does, ends up creating a huge single line for the sourceMappingURL annotation. Many text editors will choke on this, if you accidently, or purposely edit the file with the annotation.

In addition, using a data-url means the mapping data is uuencoded, which means humans can't read it. The sourcemap data is actually sometimes interesting to look at, for instance, if you want to see what files got bundled with browserify.

Also, including the sourcemap as a data-url means you just made your "production" file bigger, especially since it's uuencoded.

Instead of using a data-url with the sourcemap information inlined, just provide a map file like angular does (described above).

put the sourceMappingURL annotation on a line by itself at the end

In the angular example above, the sourceMappingURL annotation is in a // style comment inside a /* */ style comment. Kinda pointless. But worse, it no longer works with Chrome 31.0.1650.8 beta.

Presumably, Chrome Dev Tools got a bit stricter with how they recognize the sourceMappingURL annotation; it seems to like having the comment at the very end of the file. See https://code.google.com/p/chromium/issues/detail?id=305284 for more info.

Browserify also has an issue here, as it adds a line with a single semi-colon to the end of the file, right before the sourceMappingURL annotation, which also does not work in the version of Chrome I referenced.

name your sourcemap file <min-file>.map.json

Turns out these sourcemap files are json. But no one uses a .json file extension, which seems unfortunate as the browser doesn't understand them, if you happen to try loading one of the files there. Not sure if there's a restriction about naming them, but it seems like there shouldn't be, and it seems like it makes sense to just use a .json extension for them.

embed the original source in the sourcemap file

Source map files contain a list of the names of the original source files (urls, actually), and can optionally contain the original source file content in the sourcesContent key. Not everyone does this - I think most people do not put the original source in the sourcemap files (eg, jquery, angular).

If you don't include the source with the sourcesContent key, then the source will need to be retrieved by your browser. Not only is this another HTTP GET required, but you will need to provide the original .js files as well as the minized .js files on your server. And they will need to be arranged in whatever manner the source files are specified in the source map file (remember, the names are actually URLs).

If you're going to provide a separate map file, then you might as well add the source to it, so the whole wad is available in one file.

generate your sourcemap json file so that it's readable

I'm looking at you, http://code.jquery.com/jquery-2.0.3.min.map.

The file is only used at debug time, it's unlikely there's going to be much of a performance/memory hit by single-lining the file. Remember, as I said above, it's often useful to be able to read the sourcemap file, so ... make it readable.

if you use the names key in your sourcemap, put it at the end of the map file

Again, sourcemap files are interesting reading sometimes, but the names key ends up being painful to deal with, if you generate the sourcemap with something like JSON.stringify(mapObject, null, 4) or something. Because the value of the names key is an array of strings, and it's pretty long, as is not nearly as interesting to read as the other bits in the source map. Add the property to your map object last, before you stringify it, so it doesn't get in the way.

where can we publish some sourcemap best practices?

I'd like to see somewhere we can publish and debate sourcemap best practices. Idears?

bugs opened

Some bugs relevant to the "sourceMappingURL not the last thing in the file" issue/violation:

Thursday, February 07, 2013

dustup

A few months ago, I took a look at Nathan Marz's Storm project. As someone who has used and implemented Actor-like systems a few times over the years, always fun to see another take. Although Storm is aimed at the "cloud", it's easy to look at it and see how you might be able to use it for local computation. In fact, I have a problem area I'm working in right now where something like Storm might be useful, but I just need it to work in a single process. In JavaScript.

Another recent problem area I've been looking at is async, and using Promises to help with that mess. I've been looking at Kris Kowal's q specifically. It's a nice package, and there's a lot to love. But as I was thinking about how I was going to end up using them, I imagined building up these chained promises over and over again for some of my processing. As live objects. Which, on the face of it, is insane. If I can build a static structure and have object flow through them instead. Should be able to cut down on the amount of live objects created, and it will probably be easier to debug.

So, with that in mind, I finally sat down and reinterpreted Storm this evening, with my dustup project.

Some differences from Storm, besides the super obvious ones (JavaScript vs Java/Clojure, and local vs distributed):

One of the things that didn't seem right to me in Storm was differentiating spouts from bolts. So I only have bolts.
I'm also a fan of the PureData real-time graphical programming environment, and so borrowed some ideas from there. Namely, that bolts should have inlets where data comes in, and outlets where they go out, and that to hook bolts together, that you'll connect an inlet to an outlet.
Designed with CoffeeScript in mind. So you can do nice looking things like this:
```
x = new Bolt
    outlets:
       stdout: "standard output"
       stderr: "standard error"
```
which generates the following JavaScript:
```
x = new Bolt({
    outlets: {
       stdout: "standard output",
       stderr: "standard error"
    }
})
```
and this:
```
Bolt.connect a: boltA, b: boltB
```
which generates the following JavaScript
```
Bolt.connect({a: boltA, b: boltB})
```
Even though I designed it for and with CoffeeScript, you can of course use it in JavaScript, and it seems totally survivable.

Stopping there left me with something that seems useful, and still small. I hope to play with it in anger over the next week or so.

Here's an example of a topology that just passes data from one bolt to another, in CoffeeScript:
(here's the JavaScript for you CoffeeScript haters)

In the lines 1-2, we get access to the dustup package and the Bolt class that it exports.

On lines 4-6, we're creating a new bolt that only has a single outlet, named "a" (the string value associated with the property a - "outlet a" - is ignored).

On lines 8-11, we're creating a new bolt that only has a single inlet, named "b". Inlets need to provide a function which will be invoked when the inlet receives data from an outlet. In this case, the inlet function will write the data to "the console".

On line 13, we connect the a outlet from the first bolt to the b inlet of the second bolt. So that when we send a message out the a outlet, the b inlet will receive it.

Which we test on line 15, by having the first bolt emit some data. Guess what happens.

next

The next obvious thing to look at, is to see how asynchronous functions, timers, and intervals fit into this scheme.