Thursday, July 31, 2008

How does google work?

A quick type of letters and voila! The web page you want is right there. However , how exactly does Google works? How does the find the web page you are visiting now? Everyday Thousands of new websites are created so how does Google put them into their database?


Firstly, it has the evil Googlebot, Google’s Web Crawler!
















Googlebot is well, a bot which searches the web pages. It downloads them and sends them to google's huge database index. Googlebot consists of many computers requesting and fetching pages much more quickly than you can with your web browser. In fact, Googlebot can request thousands of different pages simultaneously. To avoid overwhelming web servers, or crowding out requests from human users, Googlebot deliberately makes requests of each individual web server more slowly than it’s capable of doing.


Google also learns of new web pages through their add page site www.google.com/addurl.html , by typing your website in here Google will be notified off your new web page and it will be linked on Google.


Also the Google indexer database doesn’t index common words such as the, is, on, or, of, how, why, as well as certain single digits and single letters. These words are so common that they do little to narrow a search, and therefore they can safely be discarded. The indexer also ignores some punctuation and multiple spaces, as well as converting all letters to lowercase, to improve Google’s performance.

Google also claims in their website that they have two types of technology to make searches faster, PageRank and Hypertext - Matching Analysis.

From google website :

PageRank Technology: PageRank reflects our view of the importance of web pages by considering more than 500 million variables and 2 billion terms. Pages that we believe are important pages receive a higher PageRank and are more likely to appear at the top of the search results.

PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value. We have always taken a pragmatic approach to help improve search quality and create useful products, and our technology uses the collective intelligence of the web to determine a page's importance.



Hypertext-Matching Analysis: Our search engine also analyzes page content. However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags), our technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. We also analyze the content of neighboring web pages to ensure the results returned are the most relevant to a user's query.

Credits: www.google.com


This is a picture showing what exactly happens during a search :





Thanks For Your Kind Attention.

No comments: