🤖

Robots.txt Guide

Kontroller hvilke dele af dit site søgemaskiner kan crawle

Hvad er robots.txt?

Robots.txt er en tekstfil i roden af dit website der fortæller søgemaskine crawlers hvilke dele af sitet de må eller ikke må crawle.

Placering: https://example.com/robots.txt

Vigtig note: Robots.txt blokerer kun crawling, ikke indexering. For at forhindre indexering, brug meta robots eller X-Robots-Tag.

Basic Syntax

Robots.txt Struktur

# Kommentar linje starter med #
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/

Sitemap: https://example.com/sitemap.xml

User-agent:Specificerer hvilken crawler

Disallow:Blokerer crawling af sti

Allow:Tillader crawling (override disallow)

Sitemap:Link til XML sitemap

Common Examples

1. Tillad alt (default)

User-agent: *
Disallow:

Sitemap: https://example.com/sitemap.xml

2. Bloker specifikke mapper

User-agent: *
Disallow: /admin/
Disallow: /temp/
Disallow: /private/
Disallow: /cgi-bin/

Sitemap: https://example.com/sitemap.xml

3. Bloker specifikke filtyper

User-agent: *
Disallow: /*.pdf$
Disallow: /*.xlsx$
Disallow: /*.docx$

Sitemap: https://example.com/sitemap.xml

4. Bloker URL parametre

User-agent: *
Disallow: /*?*sort=
Disallow: /*?*filter=
Disallow: /*?sessionid=

Sitemap: https://example.com/sitemap.xml

5. Forskellige rules for forskellige bots

# Google
User-agent: Googlebot
Disallow: /private/
Crawl-delay: 0

# Bing
User-agent: Bingbot
Disallow: /private/
Crawl-delay: 1

# Alle andre
User-agent: *
Disallow: /

Sitemap: https://example.com/sitemap.xml

Wildcards og Patterns

* (asterisk) - Match any sequence

Disallow: /admin/* - Blokerer alt under /admin/

$ (dollar) - End of URL

Disallow: /*.pdf$ - Blokerer kun URLs der ender med .pdf

Note: Google og Bing understøtter wildcards, men ikke alle crawlers gør.

Common User-Agents

Googlebot

Google søgemaskine

Googlebot-Image

Google billede crawler

Googlebot-News

Google News crawler

Bingbot

Bing søgemaskine

Slurp

Yahoo søgemaskine

DuckDuckBot

DuckDuckGo

Baiduspider

Baidu (Kina)

YandexBot

Yandex (Rusland)

Testing Robots.txt

Google Search Console Robots.txt Tester

Gå til Google Search Console
Vælg din property
Gå til Legacy tools → robots.txt Tester
Test specifikke URLs mod din robots.txt

Google Search Console →

Common Mistakes

❌ Blokering af CSS/JS

BLOKER IKKE CSS og JavaScript filer. Google har brug for disse for at render sider korrekt.

Disallow: /assets/ # FORKERT
Allow: /assets/ # KORREKT

❌ Case sensitivity

Robots.txt er case-sensitive! /Admin/ er forskelligt fra /admin/

❌ At tro det blokerer indexering

Robots.txt blokerer kun crawling. Sider kan stadig indexeres via eksterne links. Brug meta robots noindex for at forhindre indexering.

Best Practices

✓ Gør dette

• Placer i root directory
• Inkluder sitemap reference
• Test før deployment
• Tillad CSS og JavaScript
• Brug kommentarer til dokumentation
• Monitor i Search Console

✗ Undgå dette

• Bloker ikke CSS/JS filer
• Lad ikke 404 på robots.txt
• Brug ikke for følsomme data
• Bloker ikke hele sitet ved uheld
• Glem ikke at teste changes
• Ignorer ikke Search Console warnings