Wikipedia Web Crawler with Inverted Index

Information retrieval system with cosine similarity ranking

Tech Stack: Java
Categories: Web Crawling Information Retrieval Search Algorithms

Project Overview

This system crawls Wikipedia pages starting from a seed URL, builds an inverted index of terms, and implements ranked retrieval using cosine similarity.

Crawler Features

View on Github