Natural Language Processing Engineer Question:
Download Job Interview Questions and Answers PDF
In a corpus of N documents, one document is randomly picked. The document contains a total of T terms and the term “data” appears K times.
What is the correct value for the product of TF (term frequency) and IDF (inverse-document-frequency), if the term “data” appears in approximately one-third of the total documents?
A) KT * Log(3)
B) K * Log(3) / T
C) T * Log(3) / K
D) Log(3) / KT
Answer:
B) K * Log(3) / T
formula for TF is K/T
formula for IDF is log(total docs / no of docs containing “data”)
= log(1 / (⅓))
= log (3)
Hence correct choice is Klog(3)/T
formula for TF is K/T
formula for IDF is log(total docs / no of docs containing “data”)
= log(1 / (⅓))
= log (3)
Hence correct choice is Klog(3)/T
Download Natural Language Processing Engineer Interview Questions And Answers
PDF