0% found this document useful (0 votes)
3 views3 pages

Exp 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views3 pages

Exp 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Exp-5

After print(test)
robots.txt for https://en.wikipedia.org/robots.txt

==================================================

# robots.txt for http://www.wikipedia.org/ and friends

# Please note: There are a lot of pages on this site, and there are

# some misbehaved spiders out there that go _way_ too fast. If you're

# irresponsible, your access to the site may be blocked.

# Observed spamming large amounts of https://en.wikipedia.org/?curid=NNNNNN

# and ignoring 429 ratelimit responses, claims to respect robots:

# http://mj12bot.com/

User-agent: MJ12bot

Disallow: /

# advertising-related bots:

User-agent: Mediapartners-Google*

Disallow: /

# Wikipedia work bots:

User-agent: IsraBot

Disallow:

User-agent: Orthogaffe

Disallow:

# Crawlers that are kind enough to obey, but which we'd rather not have

# unless they're feeding search engines.

User-agent: UbiCrawler

Disallow: /

User-agent: DOC

Disallow: /

User-agent: Zao

Disallow: /

# Some bots are known to be trouble, particularly those designed to copy

# entire sites. Please obey robots.txt.

User-agent: sitecheck.internetseer.com
Disallow: /

User-agent: Zealbot

Disallow: /

User-agent: MSIECrawler

Disallow: /

User-agent: SiteSnagger

Disallow: /

User-agent: WebStripper

Disallow: /

User-agent: WebCopier

Disallow: /

User-agent: Fetch

Disallow: /

User-agent: Offline Explorer

Disallow: /

User-agent: Teleport

Disallow: /

User-agent: TeleportPro

Disallow: /

User-agent: WebZIP

Disallow: /

User-agent: linko

Disallow: /

User-agent: HTTrack

Disallow: /

User-agent: Microsoft.URL.Control

Disallow: /

User-agent: Xenu

Disallow: /

User-agent: larbin

Disallow: /

User-agent: libwww

Disallow: /

User-agent: ZyBORG

Disallow: /
User-agent: Download Ninja

Disallow: /

# Misbehaving: requests much too fast:

User-agent: fast

Disallow: /

# Sorry, wget in its recursive mode is a frequent problem.

# Please read the man page and use it properly; there is a

# --wait option you can use to set the delay between hits,

# for instance.

You might also like