Wikipedia Web Crawler | Information Retrieval Project

Project Overview

This system crawls Wikipedia pages starting from a seed URL, builds an inverted index of terms, and implements ranked retrieval using cosine similarity.

Crawler Features

Starts from seed URL: List of Pharaohs
Multithreaded crawling
Depth-limited crawling
Visited URL tracking
HTML content extraction

Wikipedia Web Crawler with Inverted Index

Project Overview

Crawler Features