Top Data Science Interview Questions

Varun Samarth

5 min read

Last Updated: 2 January 2026

| Published On: 5 April 2022

In the hustling era of artificial intelligence, machine learning, and big data, data science is one of the most dynamic sectors. Needless to say, companies are rapidly advancing their data science faculty to bring out the most potent client services and business development tactics. Due to the pre-eminent ambit of growth and well-developed positions, millions of young minds are putting their feet in this domain. Constantly hustling with the struggle of becoming a well-positioned data scientist, the most obvious concern that strikes these young minds is the question of cracking the interviews.

In this article, we have shortlisted some of the most common questions frequently asked during data science interviews and their answers.

What do you understand by the term data science?

Data Science is about uncovering hidden patterns from raw data by doing exploratory data analysis, creating models using machine learning algorithms, and interpreting results using domain knowledge.

Highlight the core differences between supervised and unsupervised learning?

Supervised learning and machine learning are two aspects of machine learning but are substantially distinct from each other with reference to their application.

DIFFERENCE BETWEEN SUPERVISED AND UNSUPERVISED LEARNING

SUPERVISED LEARNING –

Primarily used for problems like classification and regression, supervised learning is a technique that utilizes labeled data as input.

UNSUPERVISED LEARNING –

Unsupervised learning is when data provided as input is not labeled, and the aim is to establish relations from the given data wherein the model is not provided with any training. The model itself finds a pattern among the input dataset. Unsupervised learning can be utilized while dealing with problems like clustering and association; for instance, k-means for clustering problems and Apriori algorithm for association rule learning problems are some of the tasks listed under unsupervised learning.

What is a Decision Tree algorithm?

It’s a Supervised Learning algorithm where multiple decisions are taken at each branch to develop a list of rules to predict a class.

What is a Random Forest algorithm?

It is a type of decision tree where multiple trees are built instead of one, and the final result is a combination or ensemble of multiple trees.

Explain the difference between Bagging and Boosting

In bagging, multiple trees are fed with different input data, and a set of various rules are built. Then the final result is a combination of multiple individual results of different trees. In Boosting, the same input data is fed to various trees in an order such that misclassifications in the first step are given higher importance such that misclassifications reduce in further steps.
Bagging helps in reducing variance error, while Boosting reduces bias error.

What is variance and bias error, and what is the bias-variance trade-off?

Bias is the difference between actual and predicted values and happens when the model is not able to capture the true relationship between predictor and dependent variable. It could be due to assumptions taken by the modeling technique. High bias means a lot of assumptions taken, while low bias means fewer assumptions taken in the modeling technique.

Variance, on the other hand, refers to the model’s sensitivity to input data fluctuations.

Based on the above, we can infer that there is high bias and low variance with low complexity, and with an increase in complexity, bias reduces but variance increases. Thus, we need to find a balance between bias and variance such that both are low.

What is overfitting?

Overfitting refers to a model trained in a fashion that is highly accurate on trained data, but when the data changes, the accuracy reduces.

What is the difference between Accuracy, Recall, and Precision

Data Science Courses

IITM Pravartak

Advanced Certificate Programme in Agentic AI and RAG Engineering

7 Months

Live Online

Intermediate

Starts on: August 5, 2026

View Programme

IITM Pravartak

Professional Certificate Programme in Agentic AI and Applications

5 Months

Live Online

Beginner

Starts on: August 5, 2026

View Programme

MICA

Executive Programme in MarTech & AI

26 weeks

Online

Intermediate

certificate of completion

Starts on: September 22, 2026

View Programme

View all programmes

Whenever we make predictions in a 2-class problem, there are four results possible

TP (True Positive) – Correct Positive Prediction
TN (True Negative) – Correct Negative Prediction
FP (False Positive) – Incorrect Positive Prediction
FN (False Negative) – Incorrect Negative Prediction

Accuracy = (True Positive + False Negative) / (Total Positive + Total Negative)

Precision = (True Positive) / Total Positive Predicted (TP+FP)

Recall = (True Positive)/ Total Positive (True Positive + False Negative)

What are the assumptions of Linear Regression?

Linear Relationship between Dependent and Independent Variables
No Multicollinearity between independent variables
Homoscedasticity- residuals have constant variance at every level of predictor variable
Normal distribution of error terms (residuals)

What is Collaborative filtering?

In Collaborative Filtering, the idea is to find similar people who have similar interests, and based on other similar users’ recommendations are made to a user.

What do you mean by Association Rules, and where is it used?

The idea of Association rules is that some items are bought together. So, we try to find which items are purchased together so that if one of the products is bought by a user, other products that are bought together can be recommended to the user.
Another application could be if some of the items are bought together, they can be placed together in offline stores.

What do you mean by cross-validation?

Cross-Validation is used to evaluate how a model will perform when input data is changed. This is done to reduce overfitting.

In this method, the total dataset is divided into k data sets, and then we take 1 set as a test and train the model on the rest of the dataset and evaluate the test set. This step is then repeated for k-1 datasets, and each time a different dataset is kept for testing purposes.

Apart from these questions, generally, questions are asked about the projects done and the same results. Data science is an evolving field and vast in its scope. I hope this article helps aspiring and experienced data scientists claim a high-growth job that will set them apart from their peers.

~ Kapil Mahajan, Data Science Leader

——————————————–

Data Science is one of the hottest jobs right now and transitioning to data science jobs can lead to an average salary growth of 37%. If you are looking to step into this in-demand profession, upskill yourself with the most popular data science courses from Emeritus taught by faculty from leading business schools.

Emeritus courses Emeritus India Education Emeritus data science career Top Data Science Interview Questions Acing Interviews

About the Author

Varun Samarth

Senior Content Contributor, Emeritus Blog
Varun, a seasoned content creator with over 8 years of diverse experience, excels in crafting engaging content for various geographies and categories. Leveraging this expertise, he seamlessly translates complex concepts into enriching educational content for the EdTech domain. His keen understanding of research and life experiences helps him resonate with students and create fact-based content. He finds solace and inspiration in music, nurturing his creativity for content creation.

Related courses

IITM Pravartak

Professional Certificate Programme in Agentic AI and Applications

5 Months

Live Online

Starts on: August 5, 2026

View Programme

IITM Pravartak

Advanced Certificate Programme in Agentic AI and RAG Engineering

7 Months

Live Online

Starts on: August 5, 2026

View Programme

MICA

Executive Programme in MarTech & AI

26 weeks

Online

certificate of completion

Starts on: September 22, 2026

View Programme

IIM Kozhikode

Professional Certificate Programme in Data Science and Artificial Intelligence for Managers

8 Months

Online

Certificate of Completion

Starts on: September 23, 2026

View Programme

MIT xPRO

Post Graduate Program in Data Science and AI

9 Months

Live Online

Certificate of completion

Starts on: September 23, 2026

View Programme

IITM Pravartak

Executive Programme in Generative AI and Agentic AI Tools for Business

4 Months

Online

IITM Pravartak Certificate of Completion

Starts on: September 23, 2026

View Programme

IITM Pravartak

Executive Programme in AI Transformation for Business

5 Months

Live Online

Starts on: September 23, 2026

View Programme

IITM Pravartak

Executive Programme in AI-Driven Healthcare Transformation

5 Months

Live Online

Starts on: September 23, 2026

View Programme

iHUB DivyaSampark IIT Roorkee

Post Graduate Certificate Programme in Building AI-Powered Business Applications

4 Months

Live Online

Certificate of completion

Starts on: September 30, 2026

View Programme

Top Data Science Interview Questions

Data Science Courses

IITM Pravartak

Advanced Certificate Programme in Agentic AI and RAG Engineering

IITM Pravartak

Professional Certificate Programme in Agentic AI and Applications

MICA

Executive Programme in MarTech & AI

About the Author

Varun Samarth

Related courses

IITM Pravartak

Professional Certificate Programme in Agentic AI and Applications

IITM Pravartak

Advanced Certificate Programme in Agentic AI and RAG Engineering

MICA

Executive Programme in MarTech & AI

IIM Kozhikode

Professional Certificate Programme in Data Science and Artificial Intelligence for Managers

MIT xPRO

Post Graduate Program in Data Science and AI

IITM Pravartak

Executive Programme in Generative AI and Agentic AI Tools for Business

IITM Pravartak

Executive Programme in AI Transformation for Business

IITM Pravartak

Executive Programme in AI-Driven Healthcare Transformation

iHUB DivyaSampark IIT Roorkee

Post Graduate Certificate Programme in Building AI-Powered Business Applications

Data Science

Best Data Science Courses in India [2026]

Data Science

What Makes the IITM Agentic AI Course the Best Choice for Professionals

Data Science

Why Investing in the Right Data Science Course Duration Matters

Data Science Courses

IITM Pravartak

Advanced Certificate Programme in Agentic AI and RAG Engineering

IITM Pravartak

Professional Certificate Programme in Agentic AI and Applications

MICA

Executive Programme in MarTech & AI

About the Author

Varun Samarth

Related courses

IITM Pravartak

Professional Certificate Programme in Agentic AI and Applications

IITM Pravartak

Advanced Certificate Programme in Agentic AI and RAG Engineering

MICA

Executive Programme in MarTech & AI

IIM Kozhikode

Professional Certificate Programme in Data Science and Artificial Intelligence for Managers

MIT xPRO

Post Graduate Program in Data Science and AI

IITM Pravartak

Executive Programme in Generative AI and Agentic AI Tools for Business

IITM Pravartak

Executive Programme in AI Transformation for Business

IITM Pravartak

Executive Programme in AI-Driven Healthcare Transformation

iHUB DivyaSampark IIT Roorkee

Post Graduate Certificate Programme in Building AI-Powered Business Applications

Learn more about building skills for the future. Sign up for our latest newsletter

Download brochure for