Data Scientist Interview Questions & Answers Download PDF
Data Scientist based Frequently Asked Questions in various Data Scientist job interviews by interviewer. These professional questions are here to ensures that you offer a perfect answers posed to you. So get preparation for your new job hunting
55 Data Scientist Questions and Answers:
Data Scientist Interview Questions Table of Contents:
1 :: What is survivorship bias?
It is the logical error of focusing aspects that support surviving some process and casually overlooking those that did not because of their lack of prominence. This can lead to wrong conclusions in numerous different means.
2 :: Tell us what is Collaborative Filtering?
The process of filtering used by most recommender systems to find patterns and information by collaborating perspectives, numerous data sources, and several agents.
3 :: Explain me what is Interpolation and Extrapolation?
Estimating a value from 2 known values from a list of values is Interpolation. Extrapolation is approximating a value by extending a known set of values or facts.
4 :: Tell me what are Recommender Systems?
A subclass of information filtering systems that are meant to predict the preferences or ratings that a user would give to a product. Recommender systems are widely used in movies, news, research articles, products, social tags, music, etc.
5 :: Do you know what are confounding variables?
These are extraneous variables in a statistical model that correlate directly or inversely with both the dependent and the independent variable. The estimate fails to account for the confounding factor.
6 :: Please explain what are Recommender Systems?
Recommender systems are a subclass of information filtering systems that are meant to predict the preferences or ratings that a user would give to a product.
7 :: Tell me what are Eigenvalue and Eigenvector?
Eigenvectors are for understanding linear transformations. In data analysis, we usually calculate the eigenvectors for a correlation or covariance matrix. Eigenvalues are the directions along which a particular linear transformation acts by flipping, compressing or stretching.
8 :: Tell me what is Collaborative filtering?
The process of filtering used by most of the recommender systems to find patterns or information by collaborating viewpoints, various data sources and multiple agents.
9 :: Tell me Python or R – Which one would you prefer for text analytics?
The best possible answer for this would be Python because it has Pandas library that provides easy to use data structures and high performance data analysis tools.
10 :: Tell me what is the Law of Large Numbers?
It is a theorem that describes the result of performing the same experiment a large number of times. This theorem forms the basis of frequency-style thinking. It says that the sample mean, the sample variance and the sample standard deviation converge to what they are trying to estimate.
11 :: Do you know what are feature vectors?
A feature vector is an n-dimensional vector of numerical features that represent some object. In machine learning, feature vectors are used to represent numeric or symbolic characteristics, called features, of an object in a mathematical, easily analyzable way.
12 :: Tell me what are the types of biases that can occur during sampling?
☛ Selection bias
☛ Under coverage bias
☛ Survivorship bias
☛ Under coverage bias
☛ Survivorship bias
13 :: Tell me what is Linear Regression?
Linear regression is a statistical technique where the score of a variable Y is predicted from the score of a second variable X. X is referred to as the predictor variable and Y as the criterion variable.
14 :: Tell me do gradient descent methods always converge to same point?
No, they do not because in some cases it reaches a local minima or a local optima point. You don’t reach the global optima point. It depends on the data and starting conditions
15 :: What is selective bias?
Selection bias, in general, is a problematic situation in which error is introduced due to a non-random population sample.
16 :: Explain me what makes CNNs translation invariant?
As explained above, each convolution kernel acts as it’s own filter/feature detector. So let’s say you’re doing object detection, it doesn’t matter where in the image the object is since we’re going to apply the convolution in a sliding window fashion across the entire image anyways.
17 :: Please explain how do you overcome challenges to your findings?
The reason for asking this question is to discover how well the candidate approaches solving conflicts in a team environment. Their answer shows the candidate's problem-solving and interpersonal skills in stressful situations. Understanding these skills is significant because group dynamics and business conditions change. Consider answers that:
☛ Encourage discussion
☛ Demonstrate leadership
☛ Acknowledges recognizing and respecting different opinions
☛ Encourage discussion
☛ Demonstrate leadership
☛ Acknowledges recognizing and respecting different opinions
18 :: Tell me which technique is used to predict categorical responses?
Classification technique is used widely in mining for classifying data sets.
19 :: Tell me how is True Positive Rate and Recall related?
True Positive Rate = Recall. Yes, they are equal having the formula (TP/TP + FN).
20 :: Tell me how do you know which Machine Learning model you should use?
While one should always keep the “no free lunch theorem” in mind, there are some general guidelines.
21 :: Tell us what methods do you use to identify outliers within a data set?
Data scientists must be able to go beyond classroom theoretical applications to real-world applications. Your candidate's answer to this question will show how they allocate their time to finding the best way to detect outliers. This information is important to know because it demonstrates the candidate's analytical skills. Look for answers that include:
☛ Raw data analysis
☛ Models
☛ Approaches
☛ Raw data analysis
☛ Models
☛ Approaches
22 :: Tell us are expected value and mean value different?
They are not different but the terms are used in different contexts. Mean is generally referred when talking about a probability distribution or sample population whereas expected value is generally referred in a random variable context.
23 :: Tell me what is power analysis?
An experimental design technique for determining the effect of a given sample size.
24 :: Explain me when is Ridge regression favorable over Lasso regression?
You can quote ISLR’s authors Hastie, Tibshirani who asserted that, in presence of few variables with medium / large sized effect, use lasso regression. In presence of many variables with small / medium sized effect, use ridge regression.
Conceptually, we can say, lasso regression (L1) does both variable selection and parameter shrinkage, whereas Ridge regression only does parameter shrinkage and end up including all the coefficients in the model. In presence of correlated variables, ridge regression might be the preferred choice. Also, ridge regression works best in situations where the least square estimates have higher variance. Therefore, it depends on our model objective.
Conceptually, we can say, lasso regression (L1) does both variable selection and parameter shrinkage, whereas Ridge regression only does parameter shrinkage and end up including all the coefficients in the model. In presence of correlated variables, ridge regression might be the preferred choice. Also, ridge regression works best in situations where the least square estimates have higher variance. Therefore, it depends on our model objective.
25 :: Tell me how do you work towards a random forest?
The underlying principle of this technique is that several weak learners combined to provide a strong learner. The steps involved are
☛ Build several decision trees on bootstrapped training samples of data
☛ On each tree, each time a split is considered, a random sample of mm predictors is chosen as split candidates, out of all pp predictors
☛ Rule of thumb: At each split m=p√m=p
☛ Predictions: At the majority rule
☛ Build several decision trees on bootstrapped training samples of data
☛ On each tree, each time a split is considered, a random sample of mm predictors is chosen as split candidates, out of all pp predictors
☛ Rule of thumb: At each split m=p√m=p
☛ Predictions: At the majority rule