Sonntag, 20. September 2009

Resizing images in a pdf document

Today I'm going to resize an image within a pdf document from the command line (batch mode). There exists a plethora of tools available for free, but i will stick with the following ones:
  • pdfimages
  • convert (from the venerable imagemagick tool
  • kit)
  • pdflatex
  • a pdf viewer of your choice
Because you can't really edit a pdf document (aside from some expensive proprietary tools i.e. Adobe Acrobat Writer) we have to extract the desired data, process it and output it into a new pdf document.

So for image resizing the workflow will be:
Extract image -> Resize image -> Output to new pdf document

Sounds easy - doesn't it? But there's one caveat to it:
With resizing I referred to the physical size of the image as is output on the printing device

The output device (printer, screen) has a pixel density associated with it called dots per inch or dpi (read the definition if you're not familiar with it).
If the dpis of the display device and the printer don't match, images who haven't got a fixed dpi associated with it will have different physical sizes on these devices - to sum it up understanding dpi is essential.

To avoid this problem we have to determine the dpi value of the image in the pdf document and after resizing - output it with the same dpi value.

Said that, actually there are two methods of resizing an image (physical size):
  1. already mentioned above: original dpi  equals new dpi,  (pixel-)resizing the image
  2. changing the new dpi but retaining the image (pixel-)size: original dpi doesn't equal new dpi
*With original/new dpi I refer to the dpi of the orignal/new pdf document


The latter is admittedly the worse approach for output devices have a certain intrinsic dpi-value which yields best results, with other dpi-values the output device has to scale the data to its intrinsic dpi-value.
Of course you have no control over the algorithms applied in this process - in stark contrast to the first method.

Unfortunately I haven't found an easy way to determine the dpi-value of an image within a pdf-document.
In Acrobat Reader Professional 6+ there's  allegedly a tool called "preflight" and another called "pitstop" who can extract this information - both very expensive.

My approach is to guess the dpi-value, generate the pdf document, and compare the image size in a pdf viewer with the original pdf document. (assuming equal horizontal and vertical dpi-values).
This works quite well for most pdf documents, but there also exist documents where the horizontal differs form the vertical dpi-value. In this case you first adjust the horizontal dpi-value so that the widths of both images match and then you adjust the vertical dpi-value so that the heights match (or conversely).

Task

Say we've got a pdf-file with two images in it called "a.pdf" and we want a new pdf with the first image in it
with its  size of  75% the original one.

Steps

To extract  the image we call pdfimage

$ pdfimages a.pdf   image
This will create two files: "image-000.ppm" and "image-001.ppm" 
 "image" is just a prefix for the filenames each extracted image is saved in

To determine the dpi-value of the image we have to create a pdf document with the image in it and with a guessed dpi-value and adjust it until the images have the same size in the pdf viewer (as explained in detail above).

To associate a dpi-value to the image we have to convert it into a jpeg and write it into it's header - since neither  the \includegraphics command supports a dpi-value argument nor does pdflatex support the "ppm" image type:

$ convert image-000.ppm -quality 100% -density 160x160 image1.jpg
Here we converted the image into a jpeg with a dpi-value (density) of 160 and nearly lossless compression - quality 100%.

The latex document to create a pdf document is very simple, it just includes the bare scaffold to include an image:

\documentclass{scrartcl}
\usepackage[utf8]{inputenc}
\usepackage{graphicx}

\begin{document}
\begin{figure}
\includegraphics{image1.jpg}
\end{figure}
\end{document}

 Save this file under "b.tex". To create the pdf document "b.pdf" simply write:

$ pdflatex b.tex

After you have successfuly determined the dpi-value(s) of the image you can now go on to the final step of resizing it:

$ convert image-000.ppm -quality 100% -density 160x160 -scale 75% image1.jpg
It's actually the same as above with a new argument "-scale 75%" - which rescales the image to 75% its original (pixel-)size.

Again run

$ pdflatex b.tex
and you should have now a pdf document with a 75% sized image of its original image.