|
Search
engines such as Excite and AltaVista use web spiders, also known
as robots, to create the indexes for their search databases.
These robots transverse HTML trees by loading pages and following
hyperlinks, and they report the text and/or meta-tag information
to create search indexes. ROBOTS.TXT, a file that spiders
look in for information on how the site is to be cataloged.
It is a ASCII text file that sits in the document root of the
server. It defines what documents and/or directories that
confirming spiders are forbidden to index.
The
robot exclusion protocol was introduced by Martijn Koster in
1994 to deal with problems that had been arising due to the
increasing popularity of the internet and the toll web spiders
were having on system resources. Some of the problems
were caused by robots rapid-firing requests, that is loading
pages in rapid succession. Other problems such as robots
indexing information deep in directory trees, temporary information,
and even accessing cgi-scripts. The robot exclusion protocol
was quickly adopted by webmasters and web robot makers as a
way to organize and control the indexing process.
Since
then, the size of the Internet has increased dramatically and
millions of people are using it. The number of web robots
crawling the web is greater than before and it is more important
than ever for all web sites to have a properly created and maintained
ROBOTS.TXT file.
With
RoboGen you create robot exclusion files by selecting All Robots
or a specific user-agent and adding documents and/or directories
by entering the path names manually or by selecting them using
FTP. Once all the restrictions and directives are set
you can save the robots.txt file to your hard drive or upload
it directly to your server.
It
is important to remember that robot exclusion files are not
a security measure. Some robots will simply ignore the
file and others may purposely load the documents that the files
marks as disallowed. This means that robot exclusion files
are really only useful for controlling what appears in search
engines.
|