[Blog] Interview Journey: Collection of Questions
Table of Contents
1. Introduction
I started my to look for jobs as a data scientist in January. This here is a collection of questions that I was asked during all my interview processes.
Let’s start!
2. Interview Questions
The questions are seperated into Machine Learning, Coding and other Questions.
2.1 Machine Learning Questions
Explain ROC-AUC Curve
Position: Financial Data Scientist in Deal Advisory.
Explain Perplexity, BLEU
Position: Data Scientist Artificial Intelligence.
If you use a pre-trained CNN and want to fine-tune it on your downstream task, would you choose your LR low or high?
Position: Data Scientist Artificial Intelligence.
Which models would you choose if you have a small amount of data?
Position: Data Scientist Artificial Intelligence.
Which models would you choose if you have Time Series data?
Position: Data Scientist Artificial Intelligence.
How can you detect duplicates even when some features are not the same e.g. due to errors in merging two databases?
Position: Consultant AI & Data Science.
How can you use Machine Learning in Process Intelligence?
Position: Data Scientist in Process Intelligence.
Why are we not minimizing MAE in a regression task?
Position: Data Scientist for Automated Driving.
Explain Precision, Recall and F1.
Position: Data Scientist for Automated Driving, Junior NLP Data Scientist.
Explain Boosting and Bagging.
Position: Data Scientist for Automated Driving, Junior NLP Data Scientist.
Explain Random Forest.
Position: Data Scientist, Junior NLP Data Scientist.
What happens to the model if you have correlated features?
Position: Consultant AI & Data Science.
What can you do if you have missing values?
Position: Consultant AI & Data Science, Data Scientist for Automated Driving.
How can you transform categorical/ordinal features?
Position: Data Scientist.
How can you do multi-class classification with logistic regression?
Position: Data Scientist.
A costumer wants to have a model with high true positive rate and disregard any false negatives. What is your opinion on the proposed model?
Position: Data Scientist.
How can you detect Outliers and how do you deal with them?
Position: Data Scientist.
How can you robustly normalize features with outliers?
Position: Data Scientist.
What happens if you train a linear regression model with outliers in a feature?
Position: Data Scientist.
Why do you need to normalize for e.g. a logistic regression?
Position: Data Scientist.
Explain Random Forest. Which pre-processing steps can you then skip?
Position: Data Scientist.
What are disadvantages of transformers?
Position: Data Scientist.
Explain Attention is simple terms.
Position: Data Scientist.
Why did transformers take over NLP?
Position: Data Scientist.
Which model compression techniques do you know, besides Knowledge Distillation?
Position: Data Scientist.
Given a (imbalanced) multiclass classification, argue your chosen metric.
Position: Data Scientist.
Draw a confusion matrix and calculate Precision, Recall and F1.
Position: Data Scientist for Automated Driving.
Explain Convolutional Operation.
Position: Data Scientist.
Explain Backpropagation.
Position: NLP Junior Data Scientist, Data Scientist.
If you have geospatial data, how can you join two datasets based on the geospatial data (longitude, altitude)?
Position: Data Scientist.
Why is Deep Learning thriving now? What are the factors?
Position: Financial Data Scientist.
Chatbot wants to classify a text into either “cancel contract” or “not cancel contract”. If “cancel contract” then give the costumer the possibility to cancel. Which metric do you use and why?
Position: NLP Junior Data Scientist.
Why is accuracy not a good metric for a covid test?
Position: Data Scientist for Automated Driving.
Why is an ensemble of linear models not good?
Position: Data Scientist.
Explain Dropout and its use during inference.
Position: Data Scientist, Data Scientist AI.
Explain Variance and Bias Tradeoff when using Boosting and Bagging.
Position: Data Scientist.
2.2. Coding Questions
Given Code, what is the complexity?
Position: Data Scientist.
What is the complexity of the most efficient sorting algorithm?
Position: Data Scientist.
Code a function that get rids of duplicates in the list.
Position: Data Scientist.
Code a function that calculates the nth row of the Pascal Triangle.
Position: Data Scientist.
Code review questions.
Position: Junior NLP Data Scientist, Data Scientist.
Given Keras Code, explain what each part is doing, what model is being built and how many parameters.
Position: Junior NLP Data Scientist.
Code a function that gives the nth Fibonacci number.
Position: Junior NLP Data Scientist.
2.3 Other Questions
Logic Question: (i) All cats are dogs and (ii) All red dogs are cats. Which of the following statements are true?
Position: Junior NLP Data Scientist.
How does a bank make money? What is on the active site of the balance sheets of a bank?
Position: Data Scientist in Quant.
What are the parts of a software?
Position: Data Analytics & AI - Digital Operations.
Where can data be stored?
Position: Data Analytics & AI - Digital Operations.
If a human can not distinguish between talking to a human and to a bot over the phone, is the bot human?
Position: Financial Data Scientist.