Preventing Public Search Engines from Spidering PDF Files

(Sidebar to Jakob Nielsen's column "Gateway Pages Prevent PDF Shock")

There are four mechanisms you can use to keep your PDF files out of search engines:

None of these solutions is ideal. It would be much better if you could tell search engines the file types that you want them to index.

Even if you use the "nofollow" convention for PDF file links, there is still a risk that other websites will cluelessly link directly to your PDF files, and thus expose the URLs to spiders. (See sidebar for advice on how to link to PDF documents on other websites.)

As a final option, you can password protect all PDF files. Because search engines won't know the password, they won't be able to index the PDF file. This approach is good for extranets and for documents that you're selling, because users will accept the need for authentication. For standard Web browsing, however, passwords are a bad idea because they're an additional barrier between users and the information they seek./