A robots.txt file tells search engine crawlers which pages on your site to crawl and which to avoid. It's placed at the root of your website (e.g., https://example.com/robots.txt
).
Key Directives:
- User-agent: Specifies which robots the rules apply to
- Disallow: Tells robots not to access specific pages or directories
- Allow: Explicitly permits access to a page or directory
- Sitemap: Points to the location of your sitemap file
Note: robots.txt is a suggestion, not a security measure. Malicious bots may ignore it.
Common Usage Examples:
Disallow: / |
Block entire website |
Disallow: /private/ |
Block specific folder |
Disallow: /*.pdf$ |
Block all PDF files |