robots.txt Generator

Part of Network & Web Tools

Create robots.txt files to control search engine crawlers. Block pages, set crawl delays, specify sitemaps, and optimize your SEO.

robots.txt Output

How to Use the robots.txt Generator

  1. Set default policy: Choose whether to allow or disallow all robots by default. Most sites choose "Allow all" and then block specific paths.
  2. Add blocked paths: Enter directories or pages you don't want search engines to crawl, like /admin/ or /private/. One path per line.
  3. Specify sitemap: Add your sitemap URL to help search engines discover all your pages more efficiently.
  4. Set crawl delay: Optionally add a delay between requests to reduce server load. Most sites don't need this.
  5. Generate and upload: Click Generate, then download the robots.txt file and upload it to your website's root directory (domain.com/robots.txt).

What is robots.txt?

The robots.txt file is a text file placed in your website's root directory that tells search engine crawlers which pages or sections of your site they should and shouldn't access. It's part of the Robots Exclusion Protocol (REP), a standard used by websites to communicate with web crawlers and bots. When a search engine bot visits your site, it first checks for robots.txt at domain.com/robots.txt to see if there are any crawling restrictions. Use our .htaccess Generator for additional server-level access control.

While robots.txt is primarily used for SEO purposes to control how search engines index your site, it's important to understand that it's not a security measure. Well-behaved bots respect robots.txt, but malicious bots ignore it. Never use robots.txt to hide confidential pages - use proper authentication instead. The file is publicly accessible, so anyone can view which areas you've blocked.

robots.txt Syntax

User-agent: Specifies which crawler the rules apply to. Use * for all crawlers or specific names like Googlebot or Bingbot.

Disallow: Specifies paths that crawlers should not access. Disallow: /admin/ blocks the admin directory and all subdirectories.

Allow: Overrides Disallow for specific paths. Useful when you want to block a directory but allow a subdirectory within it.

Crawl-delay: Sets the number of seconds between successive requests. Not supported by all crawlers (Google ignores it).

Sitemap: Points to your XML sitemap location. Multiple sitemap declarations are allowed. Check your sitemap and domain information with our DNS Lookup tool.

Common Use Cases

Block admin areas: Prevent search engines from indexing login pages, admin panels, and backend systems that shouldn't appear in search results.

Prevent duplicate content: Block search result pages, print versions, or filtered views that create duplicate content issues.

Save crawl budget: Block low-value pages like infinite calendars, session IDs, or search results to ensure crawlers focus on important content.

Staging sites: Block all crawlers from development or staging versions of your site to prevent them from appearing in search results.

Manage server load: Set crawl delays for aggressive bots that might overwhelm your server with requests.

Best Practices

Keep it simple: Most sites only need a basic robots.txt with a few disallow rules and a sitemap reference. Overly complex files can cause confusion.

Test your file: Use Google Search Console's robots.txt Tester to verify your file works correctly and doesn't accidentally block important pages.

Use absolute paths: Always start paths with a forward slash. /admin blocks anything starting with /admin.

Case sensitivity: Paths are case-sensitive. /Admin and /admin are treated as different paths.

Wildcards: Use * to match any sequence of characters and $ to match the end of a URL.

Common Mistakes to Avoid

Blocking CSS/JS: Never block CSS and JavaScript files. Google needs these to render pages properly for mobile-first indexing. Blocking them hurts SEO.

Using for security: Don't rely on robots.txt to protect sensitive information. Malicious bots ignore it, and the file itself reveals what you're trying to hide.

Blocking entire site: Disallow: / blocks your entire website from all search engines. Only use this intentionally for staging sites.

Forgetting sitemap: Always include your sitemap URL. It helps search engines discover and index your content more efficiently.

Testing and Validation

Google Search Console: Use the robots.txt Tester tool to verify your file syntax and test if specific URLs are blocked or allowed.

Bing Webmaster Tools: Bing also provides a robots.txt tester to check how Bingbot will interpret your file.

Live testing: After uploading, visit domain.com/robots.txt to confirm it's accessible and displays correctly.

Monitor logs: Check server logs to see if crawlers are respecting your robots.txt rules and adjust as needed. Use our HTTP Header Checker to verify your server responses.