Links

pmuellr is Patrick Mueller, an IBMer doing dev advocate stuff for node.js on BlueMix.

other pmuellr thangs: muellerware.org, twitter.com/pmuellr

Tuesday, December 14, 2010

tech books on the Kindle

Three images from David Flanagan's JQuery Pocket Reference, as they look on my 3rd generation (latest) 6" (not the big one) Kindle.

The first image below is from the PDF version of the book.

The font size is a little too small for me, and I would assume, most people. It also wastes a lot of space on the left and right margins in general, on the indentation of the code sample, and the footer information.

Note that PDFs are not "resizable" in the same way a web page is; you can't make the font larger. What you see is what you get.

The next image below is from the Mobi version of the book. Mobi is a publishing format that has been used by various reader software over the years, and is the "native" format for the Kindle.

What I'm showing here is the text at the smallest font size supported. I typically read at one font size up from that. There are 8 font sizes available when you're reading Mobi files, along with being able to tweak the line spacing a bit, etc.

I'm showing the smallest font size because it shows one of the biggest problems with using Mobi for tech documents - the code sample is too big for this font size (the smallest!) and it wraps. It just looks terrible.

I have some experience with the Mobi format, converting HTML documents to Mobi using Calibre. It appears the Mobi format is pretty limited. I've tried all sorts of styling of <pre> elements, and the best you can do is force some vertical whitespace (not the default!) in front of your code samples. It's just awful.

What's a boy to do? Turns out O'Reilly also offers an ePub version of their eBooks. You can easily turn your ePub into a Mobi with Calibre, but you're not going to be able to do much better than the image above.

So I did something different:

  • unzip the ePub - ePub files are zip files with an .epub file extension

  • concatenate all the .html files in the epub together, into one file - ePub files are XHTML files for their main text content

    cat ch*.html > combined.html
    
  • remove the <head> sections with a find/replace regex pattern, etc

  • add a simple <style> section:

        h1.title {
            page-break-before:  always;
        }
    
        body {
            font-size:          200%;
            font-family:        Georgia;
        }
    
        pre {
            font-size:          70%;
            background-color:   #EEE;
            padding:            0.5em;
            overflow:           hidden;
            border-radius:      0.5em;
            -moz-border-radius: 0.5em;
        }
    
        .sidebar {
            padding:            0.0em 0.5em;
            overflow:           hidden;
            border-radius:      0.5em;
            -moz-border-radius: 0.5em;
            background-color:   #EEE;
        }
    
        .sidebar p.title {
            font-weight:        bold;
        }
    
  • bring the resulting file up in FireFox

  • print, but save to PDF instead of actually print (I'm on a Mac)

  • copy the PDF to the Kindle

Here's the result. Not perfect, but much better than the other two files. The font is larger, and the code sample is readable and well separated from the other text.

Note: FireFox is not my day-to-day web browser, but is the only browser that produces usable PDF files for files like this. PDFs produced from both Safari and Chrome only last about 30 pages or so before the text disappears, when viewed on the Kindle. Who knows.

Update on 2010/12/14 at 1:20pm - the images aren't working from flickr, so put on dropbox

9 comments:

David Flanagan said...

So in an ideal world, the Kindle would support ePub, or O'Reilly would produce better .mobi files or the O'Reilly production department would produce a PDF with a page size that matches the Kindle aspect ratio...

The PDF is an exact copy of the print book, and they do a lot of work getting line breaks and page breaks just right. So I doubt they'd ship it in different sizes.

I'll pass this post on to the appropriate folks at O'Reilly.

Patrick Mueller said...

I think the good news is that the ePub source already renders good enough for a Kindle form-factor. I just don't want to do it by hand, the way I'm doing it.

PDF is obviously not a good choice, since it can't be resized (easily). It's a really bad choice, for the Kindle, right now, since it doesn't support internal link, toc, etc.

But I got lemons (a Kindle), making lemonade.

Patrick Mueller said...

Should also mention that if you really want to experiment with turning your beautiful HTML into queasiness-inducing mobi, the KindleGen program is even easier to use than Calibre. More info here:

https://dtp.amazon.com/mn/signin

More info on the extensions to HTML for KindleGen are described here:

http://www.mobipocket.com/dev/article.asp?BaseFolder=prcgen

Ken Walker said...

So this is the same reference around the same spot as your sample on an iPad. There's several fonts and many sizes, I show a couple to show what it's like.

http://gallery.me.com/bitterboy#100396

Patrick Mueller said...

I have now automated this a bit. There's an .html file available here - https://gist.github.com/743312 - which you can drop in an .epub file folder which you've unzipped, and it will attempt to render a Kindle-izable version.

Worked fine for O'Reilly books, not so good for others, but simple enough to hack.

Adam Witwer said...

I manage the team at O'Reilly that is responsible for formatting the ebooks and thought a few comments might be helpful.

But first, let me say: very cool hack, Patrick. I love that readers are experimenting with our ebooks, and I see it as yet another reason why having no DRM was the best decision we ever made.

The way we present the code blocks in the re-flowable ebook formats (EPUB/Mobi) is a carefully considered trade-off, and I don't think we can ever come up with a solution that works perfectly for everyone. The EPUB and Mobi have CSS that specifies that the lines wrap:

pre {
white-space: pre-wrap;
font-family: "DejaVu Sans Mono", monospace;
font-size: 85%;
margin-left: 1.5em;
margin-bottom: 10px;
}

When we first released these formats, we took the opposite approach; we had the code lines not wrapping. As a result, we got a lot of customer complaints about the code "going off the edge of the screen." So we ended up with the handling that you see now.

Also, You're right that the Mobi format has very limited styling options (supporting just a handful of CSS selectors). And of course, you would have never noticed if you were using something like a Kindle DX, which has a larger screen.

With so many formats and reading systems, we have to pick what works best for most people. That said, we are always scrutinizing our decisions, tweaking settings, regenerating, and releasing new versions. So keep the feedback coming!

Patrick Mueller said...

I have to agree it's awesome having the DRM-free books from O'Reilly, and especially the .epub files, which are essentially HTML files, and thus very flexible w/r/t final layout.

Patrick Mueller said...

I've made some Kindle-ized PDFs of some liberally licensed tech books available, here: http://dl.dropbox.com/u/2192156/kindle-friendly-pdfs/index.html.

Kiran said...

Patrick -the SICP pdf was useful, primarily because of the page size and font size; ended up reading it on my laptop.
Thanks for sharing.