How does the Spider start its journey across the web? Good Question. Generally the Spider’s starting points are heavily used servers which contains a lot of website that are being hosted. The spider will begin with a popular site, indexing every keyword and word on its pages and following every link found.

Most Popular Search Engines Google, one of the oldest and most popular search engines around, began as an academic search engine. When Google Spider (bot) looked into an HTML page, it took a note for the following things.
(1) The words within the main content.
(2) Where the words were found – area of importance.
Words appearing on important HTML tags — like title, meta, H1, H2 and other positions of relative importance are indexed for special considerations during a subsequent search on Google’s interface by the user. The algorithm that Google uses while indexing a web page is that it indexes every word on the page, leaving out the articles – “a”, “an” and “the’. Other Spiders have different approaches. Each Spider is designed taking into consideration that it should perform fast and allow the users to search more efficiently or both.ome Spiders will keep track of the words in the title, the subheading, and links along with 100 most frequently used words on the page (keyword density). Lycos used this approach while indexing the web. Other systems like AltaVista go in other directions, every word on the page including the articles. In the next article which will be the continuation of this I will discuss some very important aspect of the webpage which the popular search engines like Google, Yahoo, MSN, AltaVista take very seriously —– Meta tags, Building the Index, Building a Search, Future Search etc.

