Course Preview: Applied Data Science at Thayer School of Engineering at Dartmouth
12 September 2022 | 0 min read
[Video Transcript] Applied Data Science | Thayer School of Engineering at Dartmouth
Ready to Learn More? Apply to Enroll: Professional Certificate in Applied Data Science
Hi, my name is Geoff Parker. And I would like to welcome you to this course. Thank you for joining what I hope you will find to be a tremendous learning adventure. Let's walk through some of the main topics that we'll cover along the way and then layout the roadmap. First, I'm going to discuss practical applications of analytics, and some of the reasons why it's become so important in the world today. Then we'll introduce a problem-solving framework that will help you have a systematic way to approach data analytics projects. Along the way, you'll have many opportunities to do coding demonstrations and exercises. From the very first week, you'll start to learn fundamentals of programming languages in a data environment.
We'll move into data structures and graphing so, that you'll understand sources, types, and the requirements of data for modeling, and then ways of understanding your data, and then formatting, and graphing. You'll get an introduction to understanding your data through exploratory data analysis; you'll also gain strategies on how to use simple plotting tools to render data in ways that are clear and meaningful to your audience. You also get some practice with data cleaning and the assembly of complex datasets, which constitute the majority of the work of any data scientist.
Although this course is not a deep dive into statistics and probability theory, it's still important that we cover some of the basics: means and distributions, quantile-quantile plots, probability, hypothesis testing, and then some thinking about what constitutes a model, and what are the various types of models you're likely to encounter. That will set the stage for one of the most robust models you'll see - linear regression. We'll build an ordinary least squares model, and then we'll introduce the univariate and multivariate case, and work to interpret the output from these models. We'll then dig a little bit deeper and introduce the idea of model diagnostics.
We'll examine the assumptions of these models specifically around the normality of the error terms otherwise known as residuals. We'll then work through issues of model comparison, and model fitting, and selection criteria. Throughout all of this, we'll look at our output, talk about strategies for interpreting that output, and then how do we communicate those results to our audience.
In addition to ordinary least squares regression, we'll introduce another common model, logistic regression. And this will set the stage for classification, which we'll cover later. We'll talk about logistic functions, plots, and density functions. Then, we'll introduce the logit link function, which makes a strong connection between zero-one modeling and the explanatory variables that we'll use to make the zero-one predictions. At that point, we'll be ready to examine data visualization strategies. What are the criteria for effective visualization? What are some of the considerations? Then we'll work through some examples of visualizations that are effective and those that could quite frankly use some work.
There is an art and a science of visualization, and we'll discuss the ways in which that works and the types of visualization tools that are now available. At this point, we're ready to consider experiments and causal research. We'll do this in a couple of important ways. First, we'll discuss some of the theory, and then we'll introduce some simple experiments. We'll discuss the ways in which you think about interactions and effects plots that show those interactions graphically. We'll then do some natural experiments. One, in particular, was the Great Mississippi Flood of 1927. At this point, we'll be ready to discuss targeting analysis and introduce a classic recency, frequency, and monetary value analysis, and then extend that to the general case. Through all of this, you will also get practice with examples and coding exercises that involve A/B testing on a number of datasets.
Once we've mastered the fundamentals, we'll then move to the exciting area of machine learning. We'll talk about why it has become so important. What it is? Critical applications in industry, some distinctions between supervised and unsupervised learning, and then a number of the algorithms that you're likely to encounter in practice. You'll participate in exercises using some of the simpler algorithms such as K-means clustering, as well as a number of more complicated algorithms that you're actually likely to apply in real life. In the final module, it's time to put it all together. We'll ask you to develop a project that shows how you have mastered the various tools that are now at your disposal.
Projects are most powerful when they're used as an opportunity to learn new tools and techniques in addition to those that we introduce in class. The best projects come from things that interest you deeply, and those that might be useful to show to potential employers or collaborators. Throughout the course, I would like to encourage you to take what you learn and then go deeper. There are so many resources available now that anything that might interest you is likely to have both a community around it, as well as a number of tools that you can weave together to then create solutions. Along the way, you will have experts and mentors to help you develop your skills and who can offer career guidance so that you can make the most of this experience. Thank you again, and I wish you the very best.
[End of Video Transcript]
Also Related: Online Data Science & Analytics Courses