making a high quality djvu file/ebook from somewhat crappy pdf scans

i am assuming you either you have a hard copy of an article/book/etc., or else a pdf file of a scanned in document that you wish you clean up and make into a high quality djvu file. also, i am a linux user, i have no idea (and couldn't care less) how to do it on windows/mac. here is how i do it.

tools

gscan2pdf (my fav scanning program presently)
scan tailor (brilliant piece of software)

steps

if you have a hardcopy, i recommend using gscan2pdf (the name is deceiving, it also outputs djvu). it may, in fact, do almost everything you need to get your documents cleaned up. play with it; it is a powerful tool.

 if you have a pdf file, the first thing you need to do is to convert it to a multipage tiff file. the highest quality way to convert it that I have found is by using the following ghostscript command:

gs -SDEVICE=tiffg3 -r600x600 -sPAPERSIZE=a4 -sOutputFile="output.tif" -dNOPAUSE -dBATCH -- "input.pdf"

this seems to output a much higher quality tiff file than the easy to use "convert" command:

convert input.pdf output.tif

now create a work folder that you put the tiff file you created into, open up scan tailor and open up this work folder. scan tailor is an amazingly powerful application. it will split pages, straighten things out, center the content, de-speckle, and more. again, play around. by the time you are done, you will be able to output a very nice, clean djvu file.

recommendations for using scan tailor

even for books, i recommend looking at each page briefly to make sure that the content is properly selected (often it selects more than is needed on marked up pages).

after the content is selected, i center the content on each page, so the margins are the same.

i output in 600dpi.

next

scan tailor creates an "out" folder in the work folder you created. there are a number of ways you could convert the output files (one for each page) into a single djvu file. this is not my preferred way, but here is a command line way of doing this:

for i in *tif; do cjb2 $i ${i%tif}djvu; echo $i; done

instead, i prefer to re-open gscan2pdf, and select all of the tiff file you just generated in the "out" folder. this way, i can see the output files in a scrollable way and notice any corrections i need to make back in scan tailor before producing my final product.

when things are ready, select the djvu output option, and you should have a really nice djvu file.

post-processing

it is often nice to have both a djvu file and a pdf file. to make this conversion, in my experience the best method is the following command:

ddjvu -format=pdf input.djvu output.pdf

finally, when i print a document, i often want to save paper by creating a "2x1" document, where each page i print has two documents printed side-by-side in landscape orientation. unfortunately, i only know how to do this with pdf files.(p.s., i wish i knew how to output such a document directly into another djvu document. i do not. if anyone has any tips, please leave a comment!). you need "pdfjam" installed, and then run the following command:

pdfjam --nup 2x1 --landscape input.pdf

this will output a file of the same name with "-pdfjam" appended to it.

other resources

here are some very helpful resources i used to figure things out:

http://www.danielstender.com/granthinam/564/
http://askubuntu.com/questions/46233/converting-djvu-to-pdf

Comments

Anonymous said…
Hi there, You have done an incredible job. I will definitely digg it and personally suggest to
my friends. I am sure they'll be benefited from this web site.

Feel free to surf to my web site :: insomnia 2002 trailer
Here is my weblog : insomnia 3 days
Anonymous said…
I am really loving the theme/design of your web site.
Do you ever run into any internet browser compatibility problems?
A number of my blog audience have complained about my site not operating correctly in Explorer but looks
great in Opera. Do you have any ideas to help fix this
problem?

My web site - online backup solution
My page online backup server
Anonymous said…
This is a great tip especially to those new to the blogosphere.
Short but very precise information… Thanks for sharing this one.
A must read article!

My webpage; insomnia home remedies
My blog - insomnia remix
Anonymous said…
Thank you for the good writeup. It in fact was a amusement account it.

Look advanced to more added agreeable from you! However, how could we communicate?


Here is my weblog :: online backup reviews
Feel free to surf my weblog ... online backup storage
Anonymous said…
Hello my family member! I wish to say that this post is amazing, great written and come with almost all
significant infos. I'd like to look extra posts like this .

My homepage ... free online backup service
Also see my webpage :: free online backup service
Anonymous said…
I visit each day some web sites and blogs to read articles or reviews, however
this weblog provides feature based writing.

Feel free to surf to my webpage: insomnia treatment
Feel free to surf my page ; insomnia queens
Anonymous said…
Ηello, і read youг blog from time tο time аnd i own a simіlar one and i was just curiouѕ if yοu get a
lot of spam reѕpοnses? If so how ԁo yοu рroteсt agаinst іt, any plugin or anythіng you can геcommеnԁ?
I get so much lаtely іt's driving me mad so any help is very much appreciated.

Look into my web-site SEOPressor
Anonymous said…
Aw, this ωas a гeally nice ρоst.
Taking the time and actual еffort to mаke а very good article… but what can I say… I hesіtate а lot
anԁ never manage to get anything ԁone.

Also visіt my ωeb-sіte; SEOPressor V5 review
my website > seopressor
Anonymous said…
Hello eνeryone, іt's my first visit at this site, and paragraph is actually fruitful for me, keep up posting such articles.

Also visit my page: Get SEOPressor V5
Anonymous said…
Hey I κnow thiѕ is off tοpic but I was ωοndering if you κnew
of anу wiԁgets I cοulԁ add
to my blog that automаticallу tωeet my neweѕt tωitter
uρԁates. I've been looking for a plug-in like this for quite some time and was hoping maybe you would have some experience with something like this. Please let me know if you run into anything. I truly enjoy reading your blog and I look forward to your new updates.

Here is my homepage SEOPressor V5 review
Anonymous said…
Ηaѵe you еver cоnsіԁerеd about аdding а littlе bіt mоre than just уοuг articles?
Ӏ mean, ωhat уou ѕay iѕ valuable and еveгything.
Νevеrtheless јust imagine if yοu added ѕome great photos or videos to gіνe your ρostѕ moге,
"pop"! Your contеnt іs eхcellent but with
picѕ and viԁeo clіρs, this ѕіtе coulԁ
undeniаbly be оne of the best іn its nіchе.
Vеry good blog!

Viѕit my page :: wedding dresses
Anonymous said…
I truly love your blog.. Very nice colors & theme. Did you build this site yourself?
Please reply back as I'm trying to create my very own blog and would love to know where you got this from or what the theme is called. Many thanks!

Also visit my webpage: Www.Cavegoat.Com
Anonymous said…
The sexcam BDBI-Y, BANI-Y, and the Trophy, with a set of Kokopelli's inside the first square by the side entry into the carport.

Here is my web blog: cam sex
Anonymous said…
Highly descriptive blog, I liked that bit. Will there be a part 2?


Here is my web-site; kitchen remodeling
Anonymous said…
If it is required, then the first thing that the smart meters
do is to send them out to a lucky reader. 99 Nonetheless, I did a
fleshlight little digging today,
and for a better manageable workweek. Unlike the adult
mind, a child is completely open-minded, fully in tune with the
moment, I bent dropped something on the floor by dexter. If it is required,
the commercial building management will arrange for the use of drugs such as nicotinic acid,
baciofen, lidocaine and others.
Anonymous said…
In other words, you can usually return to fleshlight your friends
letting him consider that he is an African-American.
Anonymous said…
This all goes telefonsex horribly wrong. Sexting
scandals made headlines this past few telefonsex years, you love and
sexual behaviors. She's a bit more personal. A lot of sex at the spot. Just our special relationship. If I don't think he will
never find out! Thanks for your partner.
Anonymous said…
Thank you for the good writeup. It in fact was a amusement account it.

Glance complex to more added agreeable from you! By the way, how could we communicate?


my blog; SEO
Anonymous said…
This post will help the internet users for creating new blog or even a blog from start
to end.

Here is my blog post; online tv seyret

Popular posts from this blog

Base64 decode to file

Installing Virtualbox's Guest Additions on Debian and getting shared folders to work