In A Corpus Of N Documents, One Document Is Randomly Picked. The Document Contains A Total Of T Terms And The Term “data” Appears K Times.What Is The Correct Value

In a corpus of N documents, one document is randomly picked. The document contains a total of T terms and the term “data” appears K times.

What is the correct value for the product of TF (term frequency) and IDF (inverse-document-frequency), if the term “data” appears in approximately one-third of the total documents?

A) KT * Log(3)
B) K * Log(3) / T
C) T * Log(3) / K
D) Log(3) / KT

Natural Language Processing Engineer Interview Question

Answer:

B) K * Log(3) / T
formula for TF is K/T

formula for IDF is log(total docs / no of docs containing “data”)