[Blog] Interview Journey: Collection of Questions

interview-img

Table of Contents

  1. Introduction
  2. Interview Questions
    1. Machine Learning Questions
    2. Coding Questions
    3. Other Questions

1. Introduction

I started my to look for jobs as a data scientist in January. This here is a collection of questions that I was asked during all my interview processes.

Let’s start!

2. Interview Questions

The questions are seperated into Machine Learning, Coding and other Questions.

2.1 Machine Learning Questions

interview-img
Explain ROC-AUC Curve

Position: Financial Data Scientist in Deal Advisory.

Explain Perplexity, BLEU

Position: Data Scientist Artificial Intelligence.

If you use a pre-trained CNN and want to fine-tune it on your downstream task, would you choose your LR low or high?

Position: Data Scientist Artificial Intelligence.

Which models would you choose if you have a small amount of data?

Position: Data Scientist Artificial Intelligence.

Which models would you choose if you have Time Series data?

Position: Data Scientist Artificial Intelligence.

How can you detect duplicates even when some features are not the same e.g. due to errors in merging two databases?

Position: Consultant AI & Data Science.

How can you use Machine Learning in Process Intelligence?

Position: Data Scientist in Process Intelligence.

Why are we not minimizing MAE in a regression task?

Position: Data Scientist for Automated Driving.

Explain Precision, Recall and F1.

Position: Data Scientist for Automated Driving, Junior NLP Data Scientist.

Explain Boosting and Bagging.

Position: Data Scientist for Automated Driving, Junior NLP Data Scientist.

Explain Random Forest.

Position: Data Scientist, Junior NLP Data Scientist.

What happens to the model if you have correlated features?

Position: Consultant AI & Data Science.

What can you do if you have missing values?

Position: Consultant AI & Data Science, Data Scientist for Automated Driving.

How can you transform categorical/ordinal features?

Position: Data Scientist.

How can you do multi-class classification with logistic regression?

Position: Data Scientist.

A costumer wants to have a model with high true positive rate and disregard any false negatives. What is your opinion on the proposed model?

Position: Data Scientist.

How can you detect Outliers and how do you deal with them?

Position: Data Scientist.

How can you robustly normalize features with outliers?

Position: Data Scientist.

What happens if you train a linear regression model with outliers in a feature?

Position: Data Scientist.

Why do you need to normalize for e.g. a logistic regression?

Position: Data Scientist.

Explain Random Forest. Which pre-processing steps can you then skip?

Position: Data Scientist.

What are disadvantages of transformers?

Position: Data Scientist.

Explain Attention is simple terms.

Position: Data Scientist.

Why did transformers take over NLP?

Position: Data Scientist.

Which model compression techniques do you know, besides Knowledge Distillation?

Position: Data Scientist.

Given a (imbalanced) multiclass classification, argue your chosen metric.

Position: Data Scientist.

Draw a confusion matrix and calculate Precision, Recall and F1.

Position: Data Scientist for Automated Driving.

Explain Convolutional Operation.

Position: Data Scientist.

Explain Backpropagation.

Position: NLP Junior Data Scientist, Data Scientist.

If you have geospatial data, how can you join two datasets based on the geospatial data (longitude, altitude)?

Position: Data Scientist.

Why is Deep Learning thriving now? What are the factors?

Position: Financial Data Scientist.

Chatbot wants to classify a text into either “cancel contract” or “not cancel contract”. If “cancel contract” then give the costumer the possibility to cancel. Which metric do you use and why?

Position: NLP Junior Data Scientist.

Why is accuracy not a good metric for a covid test?

Position: Data Scientist for Automated Driving.

Why is an ensemble of linear models not good?

Position: Data Scientist.

Explain Dropout and its use during inference.

Position: Data Scientist, Data Scientist AI.

Explain Variance and Bias Tradeoff when using Boosting and Bagging.

Position: Data Scientist.

2.2. Coding Questions

interview-img
Given Code, what is the complexity?

Position: Data Scientist.

What is the complexity of the most efficient sorting algorithm?

Position: Data Scientist.

Code a function that get rids of duplicates in the list.

Position: Data Scientist.

Code a function that calculates the nth row of the Pascal Triangle.

Position: Data Scientist.

Code review questions.

Position: Junior NLP Data Scientist, Data Scientist.

Given Keras Code, explain what each part is doing, what model is being built and how many parameters.

Position: Junior NLP Data Scientist.

Code a function that gives the nth Fibonacci number.

Position: Junior NLP Data Scientist.

2.3 Other Questions

interview-img
Logic Question: (i) All cats are dogs and (ii) All red dogs are cats. Which of the following statements are true?

Position: Junior NLP Data Scientist.

How does a bank make money? What is on the active site of the balance sheets of a bank?

Position: Data Scientist in Quant.

What are the parts of a software?

Position: Data Analytics & AI - Digital Operations.

Where can data be stored?

Position: Data Analytics & AI - Digital Operations.

If a human can not distinguish between talking to a human and to a bot over the phone, is the bot human?

Position: Financial Data Scientist.

updated_at 10-03-2022