Monday, March 31, 2008

PDF imposition on OS X with Quartz and RubyCocoa

Since I banged my head against this for quite a bit before coming to a satisfactory solution, I figured I’d do a writeup.

Abstract:

Imposition is the process of arranging pages on a large sheet such that they appear in the proper order when cut and folded into a booklet. PDF, being a page-oriented format, lends itself particularly well to being shuffled. While a number of different pieces of software exist to do this (freeware, shareware, and as Adobe Acrobat plugins), none of them come with source code to be tinkered with. This article discusses the development of an imposition script using only the Mac OS X drawing API and the RubyCocoa bridge.

This has the advantage of a) being freely modifiable and b) working out of the box on Mac OS 10.5. It should run just fine on 10.3 and later, provided that the RubyCocoa bridge is installed. The script provides a framework for re-ordering and arranging the pages of a PDF for booklet printing. An example subclass for printing DIN A6 (landscape) booklets is provided. More complicated imposition schemes and some sort of user interface are left as an exercise to the reader.

Problem:

I want to print DIN A6-sized booklets of 8 pages each. To do this, I need to rearrange the pages of my source PDF and print them 4-up such that they appear in the proper order when the page is cut in half and folded. This is known as imposition.

While there are a number of utilities that do this (CocoaBooklet and Cheap Impostor [or heck, even the pdfpages package for LaTeX] come to mind), I have waaaay too much time on my hands at the moment. Sometimes reinventing the wheel can provide a pleasant amount of mental exercise. So, I set out to script my way out of this.

Hypothesis:

Preview.app in 10.5 allows you to rearrange PDF pages via drag-and-drop. On 10.4 at least, every feature of Preview.app has a corresponding method in PDFKit. Therefore, it must be relatively easy to load up an instance of PDFDocument, shuffle the pages around, and spit it back out to a new file. Added bonus: with RubyCocoa now included in OS X by default, I can get up and running fairly quickly.

First Attempt:

It turns out that this is, in fact, relatively easy. The following code will spit out a PDF with pages in the proper order:

#!/usr/bin/env ruby

inpath = File.expand_path("~/Desktop/somefile.pdf")
outpath = File.expand_path("~/Desktop/somefile_imposed.pdf")

require 'osx/cocoa'
OSX.require_framework('Quartz')
include OSX

# create PDFDocument instance
pdf = PDFDocument.alloc.initWithURL(NSURL.fileURLWithPath(inpath))
# peel off the pages into our own array
pages = []
pdf.pageCount.times { pages << pdf.pageAtIndex(0); pdf.removePageAtIndex(0)}
# reinsert the pages in the desired order
[8,1,6,3,2,7,4,5].reverse.each {|old_page_no| pdf.insertPage_atIndex(pages[old_page_no-1],0)}
# spit out the rearranged PDF
pdf.writeToURL(NSURL.fileURLWithPath(outpath))
# open the rearranged PDF
system('open',outpath)

Printed 4-up with short-edge binding, the output will look something like this:

So far, so good. What if we want more control? The above was printed 4-up in “normal” order (rows right to left) using the OS X print dialog. What if we want a different order for some reason? Naturally, the easiest way to do this is through the print dialog. If possible, though, I wanted to do this programatically. That way, went my logic, I could run the whole shebang in one step. As it turns out, this is a rather thorny problem, at least the way I went about it.

Second Attempt:

It seemed simple enough: rearrange the PDF as in the previous step, then run it through the printing system to create an N-up PDF. However, there’s a bit more involved to be able to draw content for the printing system, at least in Cocoa:

  1. Create an NSWindow somewhere offscreen to support the drawing operation
  2. Create an NSView subclass to display the content and attach it to the window
  3. Create a print job for the view and set the proper options (4-up, order, etc.)
  4. Run the print job and enjoy your tasty imposed PDF

The first hurdle came when I tried to drop a simple PDFView into this scheme. PDFView has a somewhat quirky pagination scheme, such that I couldn’t get NSPrintOperation printOperationWithView:Options: to work correctly. So, I put on my wheel-reinvention hat and implemented a my own view based on NSImageView and NSPDFImageRep with a custom pagination scheme and all that jazz. So, that was up and running. Now, on to the print job. For the life of me, I couldn’t figure out how to set the print options to produce N-up pages. In fact, none of the options I set via NSPrintInfo seemed to stick. I began to despair. In the course of writing my NSImageView subclass, I set up a global variable ($debug=true) to switch my debugging statements on and off. In a remarkable stroke of seredipity, RubyCocoa turned this into an environment variable, and I started seeing debugging output from the cgpdftopdf CUPS filter, the component actually responsible for creating N-up PDF content.

Third Attempt:

We’ve already met the Ansatz for my third attempt. There is, unfortunately, zero documentation for the built-in CUPS filters, as they’re not intended to be used directly. Still, I set about trying to figure out how to call cgpdftopdf directly. The DEBUG environment variable helpfully spit out all the command-line arguments, and I faithfully copied them to my own script. No dice. I could get it create new PDF files, but no amount of fiddling with the arguments got me closer to my goal. So, instead of blindly fiddling, I examined the cgpdftopdf binary in hopes of finding some hints as to which arguments it accepts. Nothing. Just a bunch of calls to functions beginning with CGPDFContext. Hmm. What if I really waste time and re-implement the portions of cgpdftopdf that create N-up content?

Fourth Attempt:

Quartz to the rescue! As it turns out, PDF drawing isn’t hard, it’s just hard in Cocoa. Luckily, RubyCocoa doesn’t just cover Cocoa/Objective-C, but also the plain-jane C portions of the Application Kit like Core Graphics (Quartz). I absolutely love using Ruby as a bridge language. Using C libraries without having to worry about typing or memory management or pointers is just happy.

Gushing aside, it turns out that I lot of the things I was trying to do towards the end of Attempt 2 are a lot easier in pure Quartz than wrapped in Cocoa. Sure, it lacks that object-oriented goodness, but layered drawing is inherently stateful and procedural anyhow. For example, instead of mucking about with offscreen windows and NSImageViews, I simply grab a CGContext for my drawing by calling context = CGPDFContextCreateWithURL(CFURLCreateWithFileSystemPath(nil,dest_path,KCFURLPOSIXPathStyle,0), page_rect, nil), where dest_path is a some filesystem path and page_rect is a CGRect giving the page size. In place of a CGRect struct, I can pass an array of Numerics like [[0,0],[841.88,595.28]], which walks and quacks just like a CGRect (may duck typing be blessed). I then pass this context as the first parameter to all of my drawing functions, and Quartz builds my PDF for me. Most wonderful, however, is the utility function CGPDFPageGetDrawingTransform, which calculates the CGAffineTransform that puts a given page inside a given rectangle. If these sorts of goodies are exposed in the Cocoa drawing API, I couldn’t find them. It is of course entirely possible that they’re not, since Objective-C can mix in normal C with ease.

So, all that’s left to do to N-up-ify my PDF is to calculate bounding rectangles for each sub-page, pass them to CGPDFPageGetDrawingTransform, tack the result on to the current transformation matrix, and draw the page. The final product:

#!/usr/bin/env ruby
#
#  impositor.rb
#
#  Created by Jakob van Santen on 2008-03-29.
#  Copyright (c) 2008 __MyCompanyName__. Some rights reserved.
#  This code is distributed under the terms of the 
#  Creative Commons Non-Commercial Share-Alike Attribution license.

require 'osx/cocoa'
OSX.require_framework('Quartz')
include OSX

class PDFImposition
  # ways of arranging the subpages on the page
  class NUPMode
    RowsLeftToRight = 0
    ColumnsLeftToRight = 1
    RowsRightToLeft = 2
    ColumnsRightToLeft = 3
    Normal = RowsLeftToRight
  end
  attr_accessor :nup_rows, :nup_columns, :nup_mode
  def initialize(source_path,dest_path)
    @source_pdf = CGPDFDocumentCreateWithURL(CFURLCreateWithFileSystemPath(nil,source_path,KCFURLPOSIXPathStyle,0))
    @page_rect = CGPDFPageGetBoxRect(CGPDFDocumentGetPage(@source_pdf,1),KCGPDFMediaBox)
    @context = CGPDFContextCreateWithURL(CFURLCreateWithFileSystemPath(nil,dest_path,KCFURLPOSIXPathStyle,0), @page_rect, nil)
    @nup_rows = 1
    @nup_columns = 1
    @nup_mode = NUPMode::Normal
    @rotation = 0
    @imposition_map = (1..CGPDFDocumentGetNumberOfPages(@source_pdf)).to_a 
  end
  # calculate a bounding rect for page n
  def rect_for_page(n)
    row,col = position_for_page(n)
    size = scaled_page_size
    [[col*size.width,row*size.height],[size.width,size.height]]
  end
  # should the pages be rotated?
  def rotate?
    page_aspect = (@page_rect.size.width.to_f/@page_rect.size.height)
    cell_aspect = page_aspect*(@nup_rows.to_f/@nup_columns)
    (page_aspect-1)/(cell_aspect-1) < 1 # only if the aspect ratio flips
  end
  # size of each subpage
  def scaled_page_size
    full_size = [@page_rect.size.width.to_f,@page_rect.size.height.to_f]
    full_size.reverse! if rotate?
    CGSize.new(full_size[0]/@nup_columns,full_size[1]/@nup_rows)
  end
  # position of each subpage in the grid (row index, column index as measured from the origin)
  # multiplying this by the page size yields the bounding rect for the page
  def position_for_page(n)
    index = n-1
    position = case @nup_mode
      when NUPMode::RowsLeftToRight
        [@nup_rows-((index/@nup_columns) % @nup_rows) - 1,index % @nup_columns]
      when NUPMode::ColumnsLeftToRight
        [@nup_rows - (index % @nup_rows) - 1,(index/@nup_rows) % @nup_columns]
      when NUPMode::RowsRightToLeft
        [@nup_rows-((index/@nup_columns) % @nup_rows) - 1,@nup_columns - (index % @nup_columns) - 1]
      when NUPMode::ColumnsRightToLeft
        [@nup_rows - (index % @nup_rows) - 1,@nup_columns - ((index/@nup_rows) % @nup_columns) - 1]
    end
    position
  end
  # override this method to provide an imposition scheme
  def imposition_map
    (1..CGPDFDocumentGetNumberOfPages(@source_pdf)).to_a.collect {|p| [p,0]}
  end
  def run
    per_page = @nup_rows*@nup_columns
    page_counter = 0

    imposition_map.each_with_index do |map_entry,index|

      page_no,angle = *map_entry

      if page_counter == 0 # start of page
        CGContextBeginPage(@context, @page_rect)
      end

      unless page_no.nil? # page_no = nil results in a blank page
        CGContextSaveGState(@context)
        page = CGPDFDocumentGetPage(@source_pdf,page_no)
        CGContextConcatCTM(@context,CGPDFPageGetDrawingTransform(page,KCGPDFMediaBox,rect_for_page(index+1),(rotate? ? -90 : 0)+angle,true))
        CGContextDrawPDFPage(@context, page)
        CGContextRestoreGState(@context)
      end
      # uncomment to draw a border
      # CGContextStrokeRectWithWidth(@context,rect_for_page(index+1),2.0)
      page_counter += 1
      if page_counter == per_page # end of a page
        CGContextEndPage(@context)
        page_counter = 0
      end
    end

    if page_counter != per_page # didn't hit the end of a page
      CGContextEndPage(@context)
    end
  end
end

# an example subclass for creating A6 (landscape) booklets
class Invite < PDFImposition
  def initialize(*args)
    super
    @nup_rows = 2
    @nup_columns = 2
    @nup_mode = NUPMode::RowsLeftToRight
  end
  def rect_for_page(n)
    rect = super
    # pad each subpage by 12 points
    # this could be modified to account for ``creep'' in thick signatures
    [rect[0].collect {|p| p + 12},rect[1].collect {|p| p - 24}]
  end
  def imposition_map
    per_page = @nup_rows*@nup_columns

    pages = (1..CGPDFDocumentGetNumberOfPages(@source_pdf)).to_a
    # if the page count is not a multiple of per_page, pad it out with nils
    pages << nil until pages.size % per_page == 0

    imap = []
    until pages.empty?
    # recto
     imap += [pages.delete_at(-1),pages.delete_at(0),pages.delete_at(-2),pages.delete_at(1)].collect {|p| [p,0]}
     break if pages.empty?
     # verso (upside-down in long-edge duplex printing)
     imap += [pages.delete_at(-2),pages.delete_at(1),pages.delete_at(-1),pages.delete_at(0)].collect {|p| [p,180]}
    end
    imap
  end
end

imp = Invite.new("/Users/superjakob/Desktop/einladung.pdf","/Users/superjakob/Desktop/ruby_cfout.pdf")
imp.run

In the end, I spent way too much time implementing something that could have been done by hand. Still, I now have a useable framework for implementing arbitrary imposition schemes. One could write a script that uses a subclass of PDFImposition (along with some extra housekeeping like deleting the spool file) and install it in the PDF menu of the print dialog. Come to think of it, that’s kind of what CocoaBooklet does. But does it come with source code?

The take-away

  • Your mother was right. You spend 90% of your time on the last 10% of functionality.
  • The OS X drawing API is pretty neat, once you drop down to an appropriate level. Never use NSViews when you have no intention of drawing to the screen.
  • This is almost taken for granted these days, but PDF support in OS X? bella!
  • Apple done right with the BridgeSupport project. Making the Application Kit available to unskilled monkeys like me is a Good Thing. I think.

4 comments:

Zeba said...

Hi,

nice article, but as a complete newbie I have a question. I'm trying to print out PDF in such a way that on one paper I have 8 pages in following order (1,3,5,7,4,2,8,6) - the first 4 on one side and the rest on the other. I've tryed to change your first solution, but somehow it doesn't seem to work. Here is what I have done:

!/usr/bin/env ruby

inpath = File.expand_path("~/Desktop/somefile.pdf")
outpath = File.expand_path("~/Desktop/somefile_imposed.pdf")

require 'osx/cocoa'
OSX.require_framework('Quartz')
include OSX

# create PDFDocument instance
pdf = PDFDocument.alloc.initWithURL(NSURL.fileURLWithPath(inpath))
# peel off the pages into our own array
pages = []
pdf.pageCount.times { pages << pdf.pageAtIndex(0); pdf.removePageAtIndex(0)}
# reinsert the pages in the desired order
maxpages = pages.size
orderedpages = (1..maxpages).to_a
repeataddon = (maxpages / 8.0).ceil
repeatmod = maxpages % 8
addon = ([0,1,2,3,-1,-4,1,-2] * repeataddon).slice(0...-repeatmod)
finalarray = Array.new(maxpages)
orderedpages.each_index {|k| finalarray[k] = (orderedpages[k].to_f + addon[k].to_f).to_i}
finalarray.reverse.each {|old_page_no| pdf.insertPage_atIndex(pages[old_page_no-1],0)}
# spit out the rearranged PDF
pdf.writeToURL(NSURL.fileURLWithPath(outpath))
# open the rearranged PDF
system('open',outpath)


What am I doing wrong?
Thank you,

Zeljko

Jakob said...

I suspect that you were trying this with a PDF whose page count was a multiple of 8. In that case, repeatmod == 0 and Array.slice(0...0) returns an empty array. In Ruby, nonexistent elements of an array are nil, so you're adding nil (or after type-casting, 0) to each element of orderedpages and ending up a finalarray identical to orderedpages. In the end, the script then does nothing. Try slicing on (0...maxpages) instead, since you know the number of pages ahead of time, anyhow.

Alternatively, you can do this a bit more cleanly as follows:

addon = [0,1,2,3,-1,-4,1,-2]
new_pageorder = (1..pages.size).to_a.collect! {|pageno| pageno += addon[(pageno-1) % 8]}
new_pageorder.reverse.each {|old_page_no| pdf.insertPage_atIndex(pages[old_page_no-1],0)}

Since your addon is a repeating pattern, you can just use % to access the proper element, rather than going to the trouble of expanding the array.

Zeba said...

Great! Thank you once more.

Zeljko

Anonymous said...

It's been a while since you wrote this, but many thanks. However, I can't get it to work as a PDF Service on my Mac.

There are a couple of useful python scripts inside the Resources folder of the bundles for Automator actions that Extract PDF Pages and Combine PDF Pages, in /System/Library/Automator: these are the code that extracts pages and combines them!

I'm trying to produce a similar script to impose, based on these Apple python scripts.