Intro

BM25 (Best Matching 25) is an Information Retrieval ranking algorithm used by search engines dto score how relevant a document is to a query. It is an enhancement of TF-IDF that improves ranking quality by incorporating term saturation and document length normalization.

BM25 ranks documents based on:

  • Term frequency (TF): how often query terms appear in the document, with diminishing re
  • Inverse document frequency (IDF): how rare those terms are across the corpus
  • Document length normalization: penalizes overly long documents so they don’t rank higher just due to size