Steeler Information

$Date:: 2011-05-02 #$
(In Japanese)

Steeler: What is it? What is it doing?

Steeler is a Web crawler (aka robot), software to surf the web autonomously, operated at Kitsuregawa Laboratory, The University of Tokyo. If you see something like

        Mozilla/5.0 (compatible; Steeler/3.5; http://www.tkl.iis.u-tokyo.ac.jp/~crawler/)
or source IP addresses within
157.82.156.129 - 157.82.156.254
in your Web server's access log, those are footprints of Steeler.

While we intend to gather as many published documents as possible to study various social phenomena, we are anxious about disturbing webmasters. If Steeler's access is annoying to you, please indicate the fact according to the Robots Exclusion Standard or contact us as described below. Thank you for your cooperation.

How to keep crawlers out of your site

Robots Exclusion Standard has been there for years to allow webmasters or authors to prevent their material from being crawled. It consists of the following two methods.

  1. The robots.txt file
  2. If you are a webmaster with appropriate permission, you can specify directives to the crawlers in /robots.txt file at the top of your site (i.e., http://www.your-site.com/robots.txt). For example, the following directive forbids Steeler to retrieve any content from your site.

            User-agent: Steeler
            Disallow: /
    

    In addition to path prefixes, Disallow may contain wildcard character "*" and end-of-path designator "$". For example, the following forbids access to the contents below /images directory as well as the files with .gif suffix.

            User-agent: Steeler
            Disallow: /images/
            Disallow: *.gif$
    

    If the frequency of access matters, specify Crawl-delay parameter. For example, the following directs Steeler to access the site at most once per 30 seconds.

            User-agent: Steeler
            Crawl-delay: 30.0
    

  3. Robots meta tags
  4. If you can edit HTML sources (or templates), you can also protect the contents in a file-by-file manner with robots meta tags. In a nutshell, if you put

            <META NAME="robots" CONTENT="noindex,nofollow">
    
    in the header of your HTML documents, Steeler will not follow the links found in the documents.

Note that Steeler obeys a new Robots Exclusion Protocol, which major search engines adopted around 2008. It is a revision of the original protocol proposed in 1990's.

How Steeler behaves

How to contact us

If you have more questions or requests, feel free to send us email at crawler (at) tkl.iis.u-tokyo.ac.jp (replace "(at)" with @). Please clarify host name(s) and IP address(es) of your site in the message.