Portable Document Format: Cache OR Carry

PDF. Portable Document Format. It’s great for delivering print-friendly documents readable across platforms. A PDF looks the same on a Mac as it does on a PC as it does on a Linux system. Want to send a contract or an invoice to somebody with a reduced likelihood that they’ll be able to apply some sneaky edits prior to printing and signing it as they can readily do with a Word document? Use a PDF. Sure, it’s not foolproof; as with most any document format, you can buy an editor and change a PDF, but fewer people have the ability to do so than have Word-compatible text editors.

PDF is especially good for delivering invoices and reports over the Web. While HTML can be tricky to print and to display consistently across platforms, PDF can be pegged, and all it requires on the end-user’s part is a simple, free plugin download. My company makes extensive use of PDFs for just these purposes, pulling data out of our database and generating PDFs on the fly with up-to-the-minute information.

We have not done so without having to get past some roadblocks, however. More specifically, we have had problems with PDFs in Microsoft Internet Explorer. The most common problem has had to do with reloading a PDF already downloaded, most often by clicking a link in the PDF to sort a column. Such a link reloads the script that generates the PDF, sending updated sorting choices to the script so that the data can be reformated and the PDF rebuilt. In some cases, when a user tries to sort a report in this manner, he gets a standard MSIE dialogue prompting him either to save or to open the file, as MSIE doesn’t know what to do with the file by default. This error is particularly strange given that MSIE, in order to load the PDF initially, must have known what application to use to open the file. But it gets stranger. Once you opt either to save or open the file, another error pops up (a dialogue box with a red X icon rather than a DNS or 404 error) indicating that the site requested couldn’t be found — which is a bald-faced lie as, again, in order to try to sort the columns, the user had to have loaded the PDF from the site in question. This could turn into quite the existential quagmire for users of a philosophical turn of mind.

There are conflicting reports from Microsoft about why this behavior occurs. In one error report, it’s stated that it’s a feature rather than a bug. And in that case, it may be, as the report speaks to the phenomenon of cached information over SSL. It may not be a good idea to cache information sent over SSL (credit card or personal data display pages, for example). But in other reports, the same issue — though not limited to SSL connections — is listed as a bug in the software. Here’s essentially what happens behind the scenes. Many servers by default send “no cache” information to the browser when they send certain types of content. When MSIE gets this information, it still stores the information in the cache, but it deletes it immediately. It deletes it so fast, in fact, that it’s gone by the time the Adobe Acrobat plugin (in my case with the PDFs) tries to load the cached document. So the error noting that the file’s not available is displayed and both user and frazzled Web developer are one step closer to going postal.

I develop in PHP, and the issue is particularly heinous because I use sessions to determine whether or not people are logged in. There is a fix, though. To try prevent “no cache” headers from being sent, you can issue various headers (such as “Pragma: no-cache” and “Cache-Control: no-cache”) that may not necessarily override the server’s default settings. You could also modify your Web server’s configuration to change the caching information it sends. But there’s what would appear to be an even simpler fix: Execute the command (in PHP; presumably there are similar commands in other Web development languages) “session_cache_limiter()”, sending one of several possible values such as “nocache” or “private” or “public” or “private_no_expire.” This overrides the server’s settings. It must be called before you start your session using the “session_start()” command.

Which can be tricky because if you’re requiring a login to view the page if somebody’s not already logged in (if they try to hit the link directly rather than logging in and navigating to the report, for example), you’ve told their browser that it’s ok to cache pages. So they try to log in, but the login page keeps refreshing its cached version rather than showing that they’ve logged in and letting them proceed. You have to send the “private” or “private_no_expire” information in order to dodge the weird MSIE cache bug, but you can’t send it unless the user’s already logged in (or they won’t know they’ve logged in because they keep getting the cached login page), but once they’ve logged in, the session has started and it’s too late to send the caching information. It’s the chicken and egg problem, another conundrum for those of a philosophical bent.

The kluge of a solution I finally arrived at in my own code was to have my session/authentication check for a special variable before it starts a session. I send this variable only if I’m generating a PDF on the fly. If the variable exists, the authentication code knows to send the caching information before starting the session, and the PDF can be generated without the aforementioned problems. As the variable exists only when I send it, which I do only when generating certain problematic PDFs (the other issue with this is that it’s not consistent!), other pages, such as the login page (Incidentally, this might not be a problem if, when someone wasn’t logged in, I redirected to a login page, but as it is, I happen to keep the same URL and simply print out the login form if a session doesn’t exist.), aren’t adversely affected by the caching information I’m sending. Immediate problem solved and existential woes eliminated.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s