Factbites
 Where results make sense
About us   |   Why use us?   |   Reviews   |   PR   |   Contact us  

Topic: Robots Exclusion Standard


Related Topics

In the News (Thu 23 May 13)

  
  Robots Exclusion Standard - Wikipedia, the free encyclopedia
The robots exclusion standard or robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website.
There is no official standards body or RFC for the protocol.
is not appropriate as this is not a stable standard extension.
en.wikipedia.org /wiki/Robots_exclusion_standard   (590 words)

  
 TV is King Robots Exclusion Protocol   (Site not responding. Last check: 2007-09-16)
Robots txt - Robots Exclusion Protocol - REP File Robots Text File (robots.txt) It is always good practice to create a robots.txt file and place it in your root directory.
The Robots Exclusion Protocol The Robots Exclusion Protocol is a method that allows Web site administrators to indicate to visiting robots which parts of their site should not be visited by the robot...
The Robots Exclusion Protocol is not infallible, and...
www.tvisking.com /tv/Robots%20Exclusion%20Protocol   (1960 words)

  
 Robot Exclusion Standard Revisited   (Site not responding. Last check: 2007-09-16)
standard for robot exclusion, as well as to propose some suggestions for future expansion of the standard.
Nevertheless, the current standard is in use by (on the order of) 5% of the servers on the web, and as such deserves consideration.
The second record is weakly specific to all robots and strongly specific to a robot that would have the string "Rex" (without regard to case) in the User-Agent field of the Hypertext Transfer Protocol Request that it would make of a server.
www.kollar.com /robots.html   (3774 words)

  
 Making a robots.txt file - Search Engine Robots
For this reason, most robots (spiders/crawlers) abide by the 'Robots Exclusion Standard', a set of rules that constrains their behaviour.
When search engine robots come to your site they look for a special file in the root of your server called robots.txt (this is a plain text file).
There is another way to tell robots not to index a web page or follow links on it, which may be more helpful in some cases, as it can be used more conveniently on a page-by-page basis.
www.neurocyber.co.uk /articles/robots_txt.htm   (710 words)

  
 Search Indexing Robots and Robots.txt - SearchTools.com
This is all documented in the Standard for Robot Exclusion, and all robots should recognize and honor the rules in the robots.txt file.
Robots read from top to bottom and stop when they reach something that applies to them.
RoboGen visual editor for Robots Exclusion files, allowing users to choose folders and files interactively, manage multiple domains and recognize large numbers of user agents (robot self-identifiers).
www.searchtools.com /robots/robots-txt.html   (782 words)

  
 Monitor - The Web Robots FAQ page
Robots are operated by humans, who make mistakes in configuration, or simply don't consider the implications of their actions.
If you think you have discovered a new robot (ie one that is not listed on the list of active robots, and it does more than sporadic visits, drop me a line so I can make a note of it for future reference.
You can read the whole standard specification but the basic concept is simple: by writing a structured text file you can indicate to robots that certain parts of your server are off-limits to some or all robots.
members.lycos.co.uk /woorm/monitor4information/robotstxt-faq.php   (2563 words)

  
 GNU Wget Manual - Appendices
The description of the norobots standard was written, and is maintained by Martijn Koster m.koster@webcrawler.com.
These incidents indicated the need for established mechanisms for WWW servers to indicate to robots which parts of their server should not be accessed.
The value of this field is the name of the robot the record is describing access policy for.
www.editcorp.com /Personal/Lars_Appel/wget/wget_9.html   (1018 words)

  
 A Standard for Robot Exclusion
The method used to exclude robots from a server is to create a file on the server which specifies an access policy for robots.
The robot should be liberal in interpreting this field.
Instead I recommend using the robots exclusion code in the Perl libwww-perl5 library, available from CPAN in the LWP directory.
www.robotstxt.org /wc/norobots.html   (851 words)

  
 WDVL: Spiders and Robots Exclusion
Web Robots are programs that automatically traverse the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced.
The Robots Exclusion Protocol is a method that allows Web site administrators to indicate to visiting robots which parts of their site should not be visited by the robot.
The Robots META tag allows HTML authors to indicate to visiting robots if a document may be indexed, or used to harvest more links.
www.wdvl.com /Location/Search/Robots.html   (864 words)

  
 FreeFind.com
If the FreeFind spider is crawling parts of your website that you do not wish to be indexed, simply use the robots exclusion standard to control indexing.
If you are not familiar with the robots exclusion standard you can find it at robotstxt.org.
If the FreeFind spider is causing server problems which cannot be corrected using the robots exclusion standard (see examples in our FAQ), please contact us at: SpiderProblems (one word) at (that's the @ sign) FreeFind.com with a description of the server problem you are seeing.
www.freefind.com /spider.html   (221 words)

  
 Robotstxt.Net - Information about Robots.txt, the Web Robots Exclusion Standard, and writing well-behaved Web robots   (Site not responding. Last check: 2007-09-16)
The Robot will simply look for a "/robots.txt" on your site, where a site is defined as a HTTP server running on a particular host and port number.
The standard dictates that /bob would disallow /bob.html and /bob/index.html (both the file bob and files in the bob directory will not be indexed).
Although the standard is not case sensitive, directory and filnames are case sensitive.
www.robotstxt.net   (1765 words)

  
 ACT and the Robots Exclusion Standard (Application Center Test 1.0)
The standard is a method for developers of automated user-agents ("robots") and Web site administrators to determine what areas of a Web server, if any, are accessible to particular user-agents.
The Robots Exclusion Standard specifies that a file, named robots.txt, should be placed at the root content directory for the Web server.
By disabling the robots exclusion support, the user agrees to accept all resulting responsibilities and liabilities.
msdn.microsoft.com /library/en-us/act/htm/actml_ref_bots.asp?frame=true   (411 words)

  
 Robots.txt - Meta
The Robots Exclusion Standard allows advising web robots by means of the file {{SERVER}}/robots.txt, e.g for this project http://meta.wikimedia.org/robots.txt.
In your robots.txt file, you would be wise to deny access to the script directory, hence diffs, old revisions, contribs lists, etc etc, which could severely raise the load on the server.
Idea is that you deny a yourdomain.tld/trap/ directory to robots in robots.txt then write a small script that logs any IP that tries to access the /trap/ directory and adds that IP to the robots.txt in the previous folder.
meta.wikimedia.org /wiki/Robots_exclusion_file   (562 words)

  
 General Tips: Definitions of Robots.txt on the Web
A text file present in the root directory of a site which is used to control which pages are indexed by a robot.
Only spiders that adhere to the Robots Exclusion Standard will obey a robots.txt command file There are several specific fields in a robots.txt such as User-agent specifies which User Agents are allowed to access the site and "Allow/Disallow" specifies which directories a spider may access.
Web robots download this file from the server’s document root and parse it for instructions on what to index and not to index.
www.schogini.com /articles/Definitions-of-Robots.txt-on-the-Web.html   (290 words)

  
 Pay per click search engine advertising and sponsored search results   (Site not responding. Last check: 2007-09-16)
The method used to exclude robots from a server is to create a file on the...
We were just waiting for robots to start reproducing before we gave over the...
Industrial Robots can be manufactured in a wide range of sizes and so can...
www.webadvertiser.co.uk /search.php?str=robots   (175 words)

  
 How to Set Up a robots.txt to Control Search Engine Spiders (thesitewizard.com)
For those new to the robots.txt file, it is merely a text file implementing what is known as the Standard for Robot Exclusion.
Note that the robots.txt file is a robots exclusion file (with emphasis on the "exclusion") - there is no way to tell spiders to include any file or directory.
As mentioned earlier, although the robots.txt format is listed in a document called "A Standard for Robots Exclusion", not all spiders and robots actually bother to heed it.
www.thesitewizard.com /archive/robotstxt.shtml   (1854 words)

  
 Robots Exclusion Standard - KBápps.com   (Site not responding. Last check: 2007-09-16)
Robots Exclusion Standard to disallow web pages and folders from being indexed by spiders.
If you don't want your entire site to be indexed, we strongly advise that you take advantage of the Robots Exclusion Standard by setting up a /robots.txt file.
Any URL matching one of these patterns will be ignored by robots visiting your site.
www.kbapps.com /webdesign/robots.html   (97 words)

  
 Personal Robots   (Site not responding. Last check: 2007-09-16)
Handmade toy robots in tribute of the tin toy robots of the space age.
Was Robots sind und was eine robots.txt Datei bewirkt.
The Web Robots Pages Robots Exclusion Sometimes people find they have been indexed by an indexing robot, or that a resource discovery robot has visited part...
www.healthcybernetics.com /personalrobots.html   (200 words)

  
 Steeler Crawler Information
If our crawling puts you to trouble, please indicate the fact according to the Robots Exclusion Standard or contact us as described below.
Robots Exclusion Standard has been there for years to allow webmasters or authors to prevent their material from being crawled.
For more details on directives, please refer to the revised specification of the Robots Exclusion Protocol (1996), which our crawler obeys (the original specification established in 1994 is available here).
www.tkl.iis.u-tokyo.ac.jp /~crawler/crawler.html.en   (438 words)

  
 robots exclusion standard software
Robot-exclusion manager tool to inform robots and other to not index parts of your web site.
It provides FTP access to your website so you can easily select the files and directories which shouldn't be indexed by search engines.
It has a BASIC Editor in which the user can write macros making use of specific functions to get information about and sensor data and to set speed and driving data for them as well as making use of all the power and ease of BASIC language to develop simulations.
www.cutedownloads.com /two/robots-exclusion-standard.htm   (226 words)

  
 UltraSeek Server at the University of Toronto
If this is not possible, we may be able to instruct the spider to ignore certain groups of pages: contact us.
are defined by the Robots Exclusion Standard (RES).
The value of this field is the name of the robot for which the record is describing an access policy.
www.utoronto.ca /ic/ultraseek/webserv.html   (425 words)

  
 Robots - 20th Century Fox:..   (Site not responding. Last check: 2007-09-16)
A news and discussion site for those interested in robots and robotics.
Home of the Robot Competition FAQ and a variety of resource pages.
The sexed robots are autonomous wheeled platforms fitted with nylon genital organs, respectively male and female.
www.giftcat.com /m/hobby-educational/robots   (101 words)

  
 Robots.txt help
The file tells the robot (spider) which files it may spider (download).
The standard dictates that /bob would disallow /bob.html and /bob/indes.html (both the file bob and files in the bob directory will not be indexed).
Although there have been proposed standards extetions such as an Allow line or robot version control, there has been no formal endorsement by the Robots exclusion standard working group.
www.sanmiguelnow.com /robots.htm   (515 words)

  
 FDSE :: Help :: How to prevent your pages from being indexed   (Site not responding. Last check: 2007-09-16)
This standard is used to prevent sites, individual pages, or folders from being indexed by any standards-compliant indexing process.
The format of the file is one or more "User-Agent" headers followed by paths which are forbidden to that agent.
Support for the Robots Exclusion Standard may be disabled by going to "Admin Page" => "General Settings" and setting "Crawler: Rogue" to 1 (checked).
www.xav.com /scripts/search/help/1049.html   (332 words)

  
 Robots Exclusion   (Site not responding. Last check: 2007-09-16)
A Web site administrator can indicate which parts of the site should not be vistsed by a robot, by providing a specially formatted file on their site, in http://.../robots.txt.
Note that these methods rely on cooperation from the Robot, and are by no means guaranteed to work for every Robot.
If you need stronger protection from robots and other agents, you should use alternative methods such as password protection.
www.robotstxt.org /wc/exclusion.html   (365 words)

  
 robots.txt - KBCafe Web search
The Web Robots Pages Web Robots are programs that traverse the Web automatically.
Brett Tabke experiments with writing a weblog in a text file usually read only by robots.
Information about robots.txt, the Web Robots Exclusion Standard, and writing well-behaved Web robots.
www.kbcafe.com /search.aspx/robots.txt   (410 words)

  
 New Robots.txt Syntax Checker: a validator for robots.txt files   (Site not responding. Last check: 2007-09-16)
Robots.txt files (often erroneously called robot.txt, in singular) are created by webmasters to mark (disallow) files and directories of a web site that search engine spiders (and other types of robots) should not access.
This robots.txt checker is a "validator" that analyzes the syntax of a robots.txt file to see if its format is valid as established by Robot Exclusion Standard (please read the documentation and the tutorial to learn the basics) or if it contains errors.
This robots.txt analyzer is provided by Motoricerca, a non-profit italian guide to web site optimization and search engine positioning.
tool.motoricerca.info /robots-checker.phtml   (190 words)

  
 RobotPack - robots.txt,robotpack,ORDY,Open Robots Directory,webcrawler,robots exclusion standard,bots,spider,, Free ...   (Site not responding. Last check: 2007-09-16)
RobotPack is a robot-exclusion manager tool to inform robots and other automated search engine tools to not index parts of your web site.
When done, it will create the robots.txt file for you and upload it to your server.
RobotPack also comes with the Open Robots Directory (ORDY), which allows you to update the Robots database and share it freely with everyone.
www.downloadseeker.com /1687.html   (154 words)

  
 Monitor - The Web Robots page
This is the main source for information on the robots.txt Robots Exclusion Standard and other articles about writing well-behaved Web robots.
These pages have further information about these Web Robots.
- A database of currently known robots, with descriptions and contact details.
members.lycos.co.uk /woorm/monitor4information/robotstxt.php   (127 words)

  
 The Web Robots FAQ
This book is now out of print, but is freely available through the O'Reilly Open Books Project.
There is a Web robots home page on: http://www.robotstxt.org/wc/robots.html
You guessed it, it depends on the service :-) Many services have a link to a URL submission form on their search page, or have more information in their help pages.
www.robotstxt.org /wc/faq.html#away   (2557 words)

Try your search on: Qwika (all wikis)

Factbites
  About us   |   Why use us?   |   Reviews   |   Press   |   Contact us  
Copyright © 2005-2007 www.factbites.com Usage implies agreement with terms.