Steeler crawler information

$Date: 2006/09/27 07:13:16 $
(Japanese page is here)


Steeler: What is it? What is it doing?

Steeler is a web crawler (aka robot) , software to surf the web automatically. It is being developed and operated at Kitsuregawa Laboratory, The University of Tokyo . We are working on analysis and understanding of the nature of cyberspace based on the documents collected through the surfing.

While we intend to gather as many published documents as possible, we are really anxious about disturbing webmasters. If access by Steeler puts you to trouble, please indicate the fact according to the Robots Exclusion Standard or contact us as described below. Thank you for your cooperation.

How to keep crawlers out of your site

Robots Exclusion Standard has been there for years to allow webmasters or authors to prevent their material from being crawled. It consists of the following two methods.

  1. The robots.txt file
  2. If you are a webmaster with appropriate permission, you can specify directives to the crawlers in /robots.txt file at the top of your site (i.e., http://www.your-site.com/robots.txt). For example, the following directive forbids Steeler to retrieve any contents from your site.

            User-agent: Steeler
            Disallow: /
    
    Note that /robots.txt itself may be accessed multiple times when its validity expires. The Expires: HTTP header field can be used to specify the expiration date of the /robots.txt. If the field is missing, /robots.txt expires after 1 day.

    For more details on directives, please refer to the revised specification of the Robots Exclusion Protocol (1996), which Steeler obeys (the original specification established in 1994 is available here).

  3. Robots meta tags
  4. You can also protect your contents in a file-by-file manner with robots meta tags. In a nutshell, if you put

            <META NAME="robots" CONTENT="noindex,nofollow">
    in the head of your HTML documents, Steeler will not follow the links within the documents.
    Please consult the standard guide for other variants.

How Steeler behaves

How to contact us

If you have more questions or requests, feel free to send us e-mail at crawler@tkl.iis.u-tokyo.ac.jp. Please clarify your site's host name(s) in your message.