Hacker News new | past | comments | ask | show | jobs | submit login
My eBook build process and some PDF, EPUB and MOBI tips (patshaughnessy.net)
111 points by ejpastorino on Nov 27, 2012 | hide | past | favorite | 35 comments



It takes a significant amount of effort to do this all personally, this is very commendable. I self-publish and work for a company that helps others self-publish by editing/designing and converting their work, and it's always something very involved.

It's not as simple as just clicking convert either. Tables render differently in Epub and kindle, TOC creation can be a veritable mess and table rendering in Epub and Kindle conversions look different if done wrong.

Mobi conversion tools may or may not fix issues that'd appear if the official Kindle KDP tool is used, meaning a converted Mobi may end up looking entirely different than the same file once uploaded to Amazon.

Graphs... I don't even want to think about dealing with those and Kindle, congratulations on doing all of this yourself without screaming. My first couple book conversions were learning experiences, and they were mostly plain text (chapter fiction books) without varying fonts and diagrams.

So, if I had to sum up what I've learned into one idea that'd make the entire process easier, it'd be this:

Write plain text or rich text. Don't format when writing. Do write as non-fancy as possible in your master text. Then when you have to add stuff to it later, you don't discover little surprises that throw everything off.

For example: Don't indent. Kindle auto-indents when converted if indentation hasn't been defined in the style. (The official Kindle will, and only currently, who knows if it'll change in the future, and a Mobi generator generally doesn't.) So using TAB indents throughout a book instead of MS Word's feature will cause a massive headache just when you think you're done.

Since I work on Windows, the tools can be rather simple. Create a document in MS word. Save as a web page (filtered) - to remove some of the junk word creates in html. Edit the html to remove a couple other quirks, set tags for chapters, remove unicode that Kindle won't recognize (some are, some aren't), save, convert, and check what you missed. The less special your document has to look, the easier it'll be.

Again, congratulations, welcome to the world of self-publishing. It can give a nightmare of a headache, but once it's done, the joy of knowing you created something on your own is magical.


Thank you :) Yes! "magical" is a great way to describe the experience...

As you suggest, I tried to keep things simple with the text, but in the end since I use a lot of code samples and diagrams it was difficult to do that. And yes, rendering graphs was also a challenge. I didn't get into those details in the blog post, so let me know if anyone is interested in hearing more about that.


Great review. I was not aware of Bookshop, thanks for sharing your deep-dive into using it and other tools.

I began writing ebooks over the summer. To date, I have written four educational/reference titles around a theme -- "In 30 Minutes". The idea is to let newbies quickly understand a mildly complex topic. The most recent title is The $10 Small Business Website In 30 Minutes (1).

Not only do these books contain lots of screenshots and detailed TOCs, I also publish them in multiple formats -- .mobi, ePub, PDF, and PDF for paperback.

Unfortunately, I discovered that the most popular writing tools -- Microsoft Word, Google Docs, and Apple Pages, are not up to the task of creating ebook files across all of the platforms. Even if they can export to a certain format, there are limitations that force additional production and conversion steps. I also encountered the problem of forked masters, when I had to use different tools for exporting/converting to special formats.

Currently, I am using Google Docs for composition and collaboration with my copy editor. I then copy and paste the text into Scrivener (2), which is the most powerful writing and publishing tool I have ever used. It exports to .mobi, ePub, PDF, and print book PDF. Like TFA, I use Kindle Previewer to test the .mobi files. The ePub files need some mild HTML cleanup, which I do in Sigil (3).

1. http://10dollarsmallbusinesswebsite.com/

2: http://www.literatureandlatte.com/scrivener.php

3: http://code.google.com/p/sigil/


Thanks for the links - I had heard of Scrivener but didn't try using it. The approach I took with Bookshop is more geared towards programmers and web developers who are comfortable with HTML/CSS coding.

I love your "in 30 minutes" idea! Easier to read, and I'm guessing easier to write for you. My eBook definitely will take more than 30 minutes to read :)


I'd be interested to hear what cleanup the ePub files need after the scrivener export. Is it just tweaking styles or does the ebook not render correctly on some devices. Am also curious about how the scrivener mobi export compares to kindlegen. Does it do KF8?

[EDIT] - From looking at the scrivener site, they require kindlegen for mobi generation.


There are a few issues that I encountered when I tested Scrivener ePub output:

- Chapter suffixes not being properly appended

- Extra spaces appearing after headings/subheadings/sub-sub-headings

In addition, I like to look at the HTML and CSS associated with images to make sure they are not being reduced in size during the ePub compile process. This was an issue I encountered with Word and Pages. So far it hasn't cropped up in Scrivener, but I need to manually confirm this for peace of mind.

Regarding Kindlegen: Scrivener incorporates Kindlegen into the compile process for .mobi files. It's nearly seamless, and generates good output when I test in Kindle Preview and the Kindle app for iPad.

KF8: I wasn't aware of this issue until you pointed it out. It might explain a spike of returns I experienced this month. See http://www.literatureandlatte.com/forum/viewtopic.php?f=34&#... for more information.


Thanks for sharing this info.

As a Python developer and ebook writer I've gone a similar route. I'm currently writing in emacs using reStructuredText as my base format and have written code to generate [0] clean epubs which translate to mobi/kf8 pretty well. Part of this is a CSS file I'm working on [1] to make common formatting "just work" across the big ebook readers (old-kindles, newer kindles, ipads, nook and kobo).

I've spent a little bit of time (and money) getting very nice pdf generation working using sphinx and the memoir class for latex. It's not there yet as I've been focusing on the ebooks.

Yes, it feels like you need to be a programmer to create ebooks right now. You definitely need to feel comfortable editing html and css. I'm not completely happy with rst as the base format, but I don't think there is another lightweight markup language specifically targeted to authoring books. Even sphinx which is supposed to be for documentation isn't really well suited towards books. So there's little hacks here and there. Plus as I usually write about Python related material, rst lets me "test" my books using doctest and I can even templatize some stuff (I'm doing that in a book I'm currently working on).

0 - https://github.com/mattharrison/rst2epub2

1 - https://github.com/mattharrison/epub-css-starter-kit


I've published some children's fiction this year, but I suspect I will publish programming works again at some point.

I'll give another plug for Scrivener[1], which is really a great tool. All of the editions that I have in the major online bookstores come straight out of Scrivener (though I obviously used an image editor for the covers).

I wanted to also mention out awesome Leanpub[2] is. Write your files in Markdown (and they support code snippets well... definitely a service that works well for software topics), save them in Dropbox. Press a couple of buttons in your browser and you've got PDF, mobi and epub. And, you can sell right away and keep 90% - 50 cents. They make it easy to publish early in the process and keep readers up to date as you complete the work.

One bonus that's not as obvious: Leanpub also makes distribution to a sample audience easy. You can generate coupon codes trivially.

I'm planning to go straight to Leanpub with my next technical work.

[1]: http://www.literatureandlatte.com/scrivener.php [2]: http://leanpub.com/


The only complaint I've heard about Leanpub is that they don't give the author access to end user email addresses. I don't use email as a marketing tool much (almost never) but I would guess this could be a show-stopper for many eBook authors who are considering their solution.


(Leanpub cofounder here.)

We changed this very recently: readers can new choose a checkbox to share their email with the author, and this checkbox is on the purchase form as well as on the reader dashboard.

(And, being Canadian, I feel the need to apologize. So, we are sorry it took us so long to get this feature right!)


I'll be interested to hear how the Leanpub output compares to Scrivener. The Scrivener pdf looks decent. There are a few line spacing issues though on the example I saw on Amazon.



I suspect they are using pandoc at the backend to do something like pandoc -lean.pdf lean.markdown?


I will second Leanpub. I'm going straight to them with anything. It's just fantastic.


Interesting. I agree that using something like Pages or Scrivener (I love Scrivener) can help you to focus on the writing and I wish I could have used them to write my book [1].

However, I'm curious to know how he handled code examples; did he just pasted them in Pages? One advantage of using something like Sphinx is to be able to include code examples from external files. This makes it easy to update and test the examples. Another killer feature is pyobject: it allows you to include only a Python function or class from a larger file. It'd be nice to have this feature for other languages, though.

I blogged about using Sphinx to write my book here: http://pedrokroger.net/2012/10/using-sphinx-to-write-books/

[1] http://musicforgeeksandnerds.com


FYI I used a Ruby library called Coderay to handle code highlighting. This wasn't integrated with Pages, but instead with the Bookshop Ruby gem after I moved the text into HTML files.

The code was inline in the HTML, but set apart using pre tags.


It would be interesting to compare this with another author's experience [1] using Sphinx. He also notes that EPUB and MOBI outputs are more challenging than PDF ones.

[1]: http://pedrokroger.net/2012/10/using-sphinx-to-write-books/


I'm the author of this article. The main problem I found with EPUB and MOBI is how they are implemented in each reader. For instance, kindle 2 doesn't format tables correctly, while kindle 3 does. iBooks has some undocumented bugs that makes you want to cry. If your book is simple (just text) there's no problem; but if you have source code, tables, images, etc., you may run into problems.


Agree. As I was looking into various ebook production tools that are commandline based, Sphinx and pandoc struck me as two very good solutions.


Thanks for sharing. I recently launched into writing an ebook (educational, kids). Even though I use Word for tons of stuff I became concerned that it would insert all sorts of unwanted junk into the file. I looked around and decided to start the process on Sigil. My reasoning was that I could easily move the text into just about any other platform if I wanted to.

My first impulse was to simply write it all in a plain text editor and deal with formatting and producing all the various file types later on. One file per chapter, etc. However, with Sigil I can deal with images and TOC from the very start, which might be an advantage.

It's interesting to read about how other's have approached this. It sounds like the toughest part of the job might very well be getting the various formats to look the way they should.


If you are doing ebook only (no pdf) and it has limited formatting it should be pretty simple to get it to work on most devices.

Though I've yet to actually publish physical books (only have proofs), I'm not willing to commit to using HTML as the base format. (Sounds like princexml might help though if you decide you want to go the physical route later).


It seems there's a huge opening in the market to make an end-to-end publishing solution. We're all taking this scrappy approach* to making our books work cross-device. It's something I'd gladly pay for.

*I wrote about my pipeline here: http://startupframework.tumblr.com/post/36675629669/format-e...


I just shot you an invite to the PenFM beta, because you hit it on the head. www.pen.fm is taking on that end-to-end publishing challenge, and then some. Having worked in epublishing for a few years now, I think I've finally come up with a way to automate the hell out of epublishing--and the results show up. At any point in writing in PenFM, you can click download and get your work formatted perfectly in epub, mobi, or pdf. Formatting improvements are coming rapidly, and mostly present already for mobi.

The biggest problem I've recognized with epublishing is that you have to get the formatting down perfectly, and usually that requires a lot of work by hand making sure your input HTML file is exactly as it should be. When you control how content is inputted to a platform, it's much easier to automate perfect-formatted rendering of that input HTML file, including TOC by inference.


www.pen.fm yields an ''Internal Server Error".


We do this at Leanpub (http://leanpub.com). We're free to use, and pay 90% - 50 cents royalties. Chances are you've seen a few of our books on HN :)


My service, http://bookspry.com, does something like that. It's still in semi-beta (which means I'm taking paying customers, but the "do-it-yourself" plan is still very much the "I'm-doing-it-for-you" plan, since you'd be shocked at the bizarre formatting options people put into their Word document that have to be cleaned).


http://pressbooks.com/ and you don't even have to pay for it (if you have less than 6 books)


This article is a great example of why LiberWriter has been doing fairly well. Most of us here understand it and think it's pretty cool. Imagine your mother or father, who have retired and decided to finally write that book they've always thought of trying to parse all of that. Gems? XML? Say what?


I hadn't heard of Bookshop before, that's pretty neat.

<shameless plug>

I put together a project named Docverter that uses pandoc and calibre to do a bunch of this stuff. The conversions work pretty well, and it's all free.

http://www.docverter.com

</shameless plug>


Doesn't Pages export to epub?


Yes, and it might be good enough for short, simple documents.

For me using a product like PrinceXML gave me the power of HTML/CSS to control the eBook's appearance more precisely. Also, using a Ruby-based build process let me take advantage of source control, ERB/Ruby code and other things. In other words, it made the whole process easier to manage for producing a large document.


I'm interested in whether you looked into using a DocBook tool chain via publican or asciidoc.


The reason I asked was your comment about taking advantage of source control and the implied separation of content from formatting.

The only real advantage that I can see to going with publican[1] or asciidoc[2] is that they're free tools. The main disadvantage is that you'd have to define formatting via XSL.

[1] http://fedoraproject.org/wiki/DocBook

[2] http://www.methods.co.nz/asciidoc/


XSL! Oh no! I'll have to steer clear of that :)


No I didn't. Sorry I'm not familiar with it - can you tell us what the pros/cons of that approach might be vs. what I did?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: