making a high quality djvu file/ebook from somewhat crappy pdf scans
i am assuming you either you have a hard copy of an article/book/etc., or else a pdf file of a scanned in document that you wish you clean up and make into a high quality djvu file. also, i am a linux user, i have no idea (and couldn't care less) how to do it on windows/mac. here is how i do it.
tools
gscan2pdf (my fav scanning program presently)
scan tailor (brilliant piece of software)
steps
if you have a hardcopy, i recommend using gscan2pdf (the name is deceiving, it also outputs djvu). it may, in fact, do almost everything you need to get your documents cleaned up. play with it; it is a powerful tool.
if you have a pdf file, the first thing you need to do is to convert it to a multipage tiff file. the highest quality way to convert it that I have found is by using the following ghostscript command:
this seems to output a much higher quality tiff file than the easy to use "convert" command:
now create a work folder that you put the tiff file you created into, open up scan tailor and open up this work folder. scan tailor is an amazingly powerful application. it will split pages, straighten things out, center the content, de-speckle, and more. again, play around. by the time you are done, you will be able to output a very nice, clean djvu file.
recommendations for using scan tailor
even for books, i recommend looking at each page briefly to make sure that the content is properly selected (often it selects more than is needed on marked up pages).
after the content is selected, i center the content on each page, so the margins are the same.
i output in 600dpi.
next
scan tailor creates an "out" folder in the work folder you created. there are a number of ways you could convert the output files (one for each page) into a single djvu file. this is not my preferred way, but here is a command line way of doing this:
instead, i prefer to re-open gscan2pdf, and select all of the tiff file you just generated in the "out" folder. this way, i can see the output files in a scrollable way and notice any corrections i need to make back in scan tailor before producing my final product.
when things are ready, select the djvu output option, and you should have a really nice djvu file.
post-processing
it is often nice to have both a djvu file and a pdf file. to make this conversion, in my experience the best method is the following command:
finally, when i print a document, i often want to save paper by creating a "2x1" document, where each page i print has two documents printed side-by-side in landscape orientation. unfortunately, i only know how to do this with pdf files.(p.s., i wish i knew how to output such a document directly into another djvu document. i do not. if anyone has any tips, please leave a comment!). you need "pdfjam" installed, and then run the following command:
this will output a file of the same name with "-pdfjam" appended to it.
other resources
here are some very helpful resources i used to figure things out:
http://www.danielstender.com/granthinam/564/
http://askubuntu.com/questions/46233/converting-djvu-to-pdf
tools
gscan2pdf (my fav scanning program presently)
scan tailor (brilliant piece of software)
steps
if you have a hardcopy, i recommend using gscan2pdf (the name is deceiving, it also outputs djvu). it may, in fact, do almost everything you need to get your documents cleaned up. play with it; it is a powerful tool.
if you have a pdf file, the first thing you need to do is to convert it to a multipage tiff file. the highest quality way to convert it that I have found is by using the following ghostscript command:
gs -SDEVICE=tiffg3 -r600x600 -sPAPERSIZE=a4 -sOutputFile="output.tif" -dNOPAUSE -dBATCH -- "input.pdf"
this seems to output a much higher quality tiff file than the easy to use "convert" command:
convert input.pdf output.tif
now create a work folder that you put the tiff file you created into, open up scan tailor and open up this work folder. scan tailor is an amazingly powerful application. it will split pages, straighten things out, center the content, de-speckle, and more. again, play around. by the time you are done, you will be able to output a very nice, clean djvu file.
recommendations for using scan tailor
even for books, i recommend looking at each page briefly to make sure that the content is properly selected (often it selects more than is needed on marked up pages).
after the content is selected, i center the content on each page, so the margins are the same.
i output in 600dpi.
next
scan tailor creates an "out" folder in the work folder you created. there are a number of ways you could convert the output files (one for each page) into a single djvu file. this is not my preferred way, but here is a command line way of doing this:
for i in *tif; do cjb2 $i ${i%tif}djvu; echo $i; done
instead, i prefer to re-open gscan2pdf, and select all of the tiff file you just generated in the "out" folder. this way, i can see the output files in a scrollable way and notice any corrections i need to make back in scan tailor before producing my final product.
when things are ready, select the djvu output option, and you should have a really nice djvu file.
post-processing
it is often nice to have both a djvu file and a pdf file. to make this conversion, in my experience the best method is the following command:
ddjvu -format=pdf input.djvu output.pdf
finally, when i print a document, i often want to save paper by creating a "2x1" document, where each page i print has two documents printed side-by-side in landscape orientation. unfortunately, i only know how to do this with pdf files.(p.s., i wish i knew how to output such a document directly into another djvu document. i do not. if anyone has any tips, please leave a comment!). you need "pdfjam" installed, and then run the following command:
pdfjam --nup 2x1 --landscape input.pdf
this will output a file of the same name with "-pdfjam" appended to it.
other resources
here are some very helpful resources i used to figure things out:
http://www.danielstender.com/granthinam/564/
http://askubuntu.com/questions/46233/converting-djvu-to-pdf
Comments
my friends. I am sure they'll be benefited from this web site.
Feel free to surf to my web site :: insomnia 2002 trailer
Here is my weblog : insomnia 3 days
Do you ever run into any internet browser compatibility problems?
A number of my blog audience have complained about my site not operating correctly in Explorer but looks
great in Opera. Do you have any ideas to help fix this
problem?
My web site - online backup solution
My page online backup server
Short but very precise information… Thanks for sharing this one.
A must read article!
My webpage; insomnia home remedies
My blog - insomnia remix
Look advanced to more added agreeable from you! However, how could we communicate?
Here is my weblog :: online backup reviews
Feel free to surf my weblog ... online backup storage
significant infos. I'd like to look extra posts like this .
My homepage ... free online backup service
Also see my webpage :: free online backup service
this weblog provides feature based writing.
Feel free to surf to my webpage: insomnia treatment
Feel free to surf my page ; insomnia queens
lot of spam reѕpοnses? If so how ԁo yοu рroteсt agаinst іt, any plugin or anythіng you can геcommеnԁ?
I get so much lаtely іt's driving me mad so any help is very much appreciated.
Look into my web-site SEOPressor
Taking the time and actual еffort to mаke а very good article… but what can I say… I hesіtate а lot
anԁ never manage to get anything ԁone.
Also visіt my ωeb-sіte; SEOPressor V5 review
my website > seopressor
Also visit my page: Get SEOPressor V5
of anу wiԁgets I cοulԁ add
to my blog that automаticallу tωeet my neweѕt tωitter
uρԁates. I've been looking for a plug-in like this for quite some time and was hoping maybe you would have some experience with something like this. Please let me know if you run into anything. I truly enjoy reading your blog and I look forward to your new updates.
Here is my homepage SEOPressor V5 review
Ӏ mean, ωhat уou ѕay iѕ valuable and еveгything.
Νevеrtheless јust imagine if yοu added ѕome great photos or videos to gіνe your ρostѕ moге,
"pop"! Your contеnt іs eхcellent but with
picѕ and viԁeo clіρs, this ѕіtе coulԁ
undeniаbly be оne of the best іn its nіchе.
Vеry good blog!
Viѕit my page :: wedding dresses
Please reply back as I'm trying to create my very own blog and would love to know where you got this from or what the theme is called. Many thanks!
Also visit my webpage: Www.Cavegoat.Com
Here is my web blog: cam sex
Here is my web-site; kitchen remodeling
do is to send them out to a lucky reader. 99 Nonetheless, I did a
fleshlight little digging today,
and for a better manageable workweek. Unlike the adult
mind, a child is completely open-minded, fully in tune with the
moment, I bent dropped something on the floor by dexter. If it is required,
the commercial building management will arrange for the use of drugs such as nicotinic acid,
baciofen, lidocaine and others.
letting him consider that he is an African-American.
scandals made headlines this past few telefonsex years, you love and
sexual behaviors. She's a bit more personal. A lot of sex at the spot. Just our special relationship. If I don't think he will
never find out! Thanks for your partner.
Glance complex to more added agreeable from you! By the way, how could we communicate?
my blog; SEO
to end.
Here is my blog post; online tv seyret