Converting pdf files to black and white
I recently downloaded a book from archive.org, and it was off-color and not easily printable (they used to provide black/white pdf files for books, but apparently they stopped doing so). I needed to convert it to a monochrome file so I could print it. This lead me down a number of dead-ends, and so I want to explain the best way I found to do this.
First, convert each page to a separate PNG file. Here is the command I used:
pdftoppm Hutchings.pdf file_prefix -png
Note that you can make the file_prefix anything you would like.
Next, you will need to determine the "threshold" at which point these pictures become black and white. Play around with it using this command:
convert file_prefix-1.png -threshold 35% testing-1.png
When you have determined the threshold, now it is time to batch processes all your PNG files (any files that need a different threshold can be individually adjusted using the above convert command. I use mogrify to do this, and I also make sure to include a separate path for the new files so I do not overwrite the original PNG files:
mogrify -path temp/ -format png -threshold 50% *.png
Next I use gscan2pdf to combine all the files into a single PDF again, and finally, I use the command ocrmypdf to ocr my file and prepare it for archiving:
ocrmypdf input.pdf output.pdf
First, convert each page to a separate PNG file. Here is the command I used:
pdftoppm Hutchings.pdf file_prefix -png
Note that you can make the file_prefix anything you would like.
Next, you will need to determine the "threshold" at which point these pictures become black and white. Play around with it using this command:
convert file_prefix-1.png -threshold 35% testing-1.png
When you have determined the threshold, now it is time to batch processes all your PNG files (any files that need a different threshold can be individually adjusted using the above convert command. I use mogrify to do this, and I also make sure to include a separate path for the new files so I do not overwrite the original PNG files:
mogrify -path temp/ -format png -threshold 50% *.png
Next I use gscan2pdf to combine all the files into a single PDF again, and finally, I use the command ocrmypdf to ocr my file and prepare it for archiving:
ocrmypdf input.pdf output.pdf
Comments