Given a n-word document
a = {w1,
w2,...wn} and a set of
n recognized words, one can
represent q and a each as a vector of word frequencies
and
. A common measure of
similarity between two word frequency vectors
and
weighted by inverse
document frequency (idf) is the
cosine distance between them:
![]() ![]() ![]() ![]() |