Natural Language Processing Engineer Question:

Download Job Interview Questions and Answers PDF

In a corpus of N documents, one document is randomly picked. The document contains a total of T terms and the term “data” appears K times.

What is the correct value for the product of TF (term frequency) and IDF (inverse-document-frequency), if the term “data” appears in approximately one-third of the total documents?

A) KT * Log(3)
B) K * Log(3) / T
C) T * Log(3) / K
D) Log(3) / KT

Natural Language Processing Engineer Interview Question
Natural Language Processing Engineer Interview Question

Answer:

B) K * Log(3) / T
formula for TF is K/T

formula for IDF is log(total docs / no of docs containing “data”)

= log(1 / (⅓))

= log (3)

Hence correct choice is Klog(3)/T

Download Natural Language Processing Engineer Interview Questions And Answers PDF

Previous QuestionNext Question
How many trigrams phrases can be generated from the following sentence, after performing following text cleaning steps:

Stopword Removal
Replacing punctuations by a single space
“#Analytics-vidhya is a great source to learn @data_science.”

A) 3
B) 4
C) 5
D) 6
E) 7
What is the major difference between CRF (Conditional Random Field) and HMM (Hidden Markov Model)?

A) CRF is Generative whereas HMM is Discriminative model
B) CRF is Discriminative whereas HMM is Generative model
C) Both CRF and HMM are Generative model
D) Both CRF and HMM are Discriminative model