|
useit.com |
| Search |
There are four mechanisms you can use to keep your PDF files out of search engines:
/pdf, for example, add the following two lines to your robots.txt file:
User-agent: *
Disallow: /pdf/
The robots.txt file should be at your website's root level (e.g., www.useit.com/robots.txt).
User-agent: Googlebot
Disallow: /*.pdf$
<meta name="robots" content="nofollow">
If you've followed the recommendation to use a gateway page for each PDF file, and you ensure that the gateway page contains the only link to the PDF, then preventing search engines from following the link will do the trick.
Even if you use the "nofollow" convention for PDF file links, there is still a risk that other websites will cluelessly link directly to your PDF files, and thus expose the URLs to spiders. (See sidebar for advice on how to link to PDF documents on other websites.)
As a final option, you can password protect all PDF files. Because search engines won't know the password, they won't be able to index the PDF file. This approach is good for extranets and for documents that you're selling, because users will accept the need for authentication. For standard Web browsing, however, passwords are a bad idea because they're an additional barrier between users and the information they seek./