Converting pdf files to black and white

I recently downloaded a book from archive.org, and it was off-color and not easily printable (they used to provide black/white pdf files for books, but apparently they stopped doing so). I needed to convert it to a monochrome file so I could print it. This lead me down a number of dead-ends, and so I want to explain the best way I found to do this.

First, convert each page to a separate PNG file. Here is the command I used:

pdftoppm Hutchings.pdf file_prefix -png

Note that you can make the file_prefix anything you would like.

Next, you will need to determine the "threshold" at which point these pictures become black and white. Play around with it using this command:

convert file_prefix-1.png -threshold 35% testing-1.png

When you have determined the threshold, now it is time to batch processes all your PNG files (any files that need a different threshold can be individually adjusted using the above convert command. I use mogrify to do this, and I also make sure to include a separate path for the new files so I do not overwrite the original PNG files:

mogrify -path temp/ -format png -threshold 50% *.png

Next I use gscan2pdf to combine all the files into a single PDF again, and finally, I use the command ocrmypdf to ocr my file and prepare it for archiving:

ocrmypdf input.pdf output.pdf

Comments

Popular posts from this blog

Base64 decode to file

Swiftboating from the Left - Paul Harris's "The Real McCain"

Installing Virtualbox's Guest Additions on Debian and getting shared folders to work