Algorithm-Based Ranking
Systems
Crawling, Indexing, and Ranking
The
search engines have several major goals and functions. Which includes:
•Crawling
and indexing the billions of documents (pages and files) accessible on the Web.
•Responding
to user queries by providing lists of relevant pages.
Crawling and Indexing
•Imagine
the World Wide Web as a network of stops in a big city subway system.
•Each
stop is its own unique document (usually a web page, but sometimes a PDF, JPEG,
or other file).
•The
search engines need a way to “crawl” the entire city and find all the stops
along the way, so they use the best path available: the links between web
pages.
•For
example
•The
link structure of the Web serves to bind together all of the pages that were
made public as a result of someone linking to them. Through links, search
engines’ automated robots, called crawlers or spiders (as displayed in above
figure), can reach the many billions of interconnected documents.
•Once
the engines find these pages, their next job is to parse the code from them and
store selected pieces of the pages in massive arrays of hard drives, to be
recalled when needed in a query. To accomplish the monumental task of holding
billions of pages that can be accessed in a fraction of a second, the search
engines have constructed massive data centres to deal with all this data.
EmoticonEmoticon