Course Preview | Professional Certificate in Data Engineering from MIT xPRO

5:52 min

31

Hi, my name is John Williams. I'm Professor of Information Engineering at MIT, and it's my pleasure to introduce you to this course on Data Engineering. Data engineering is one of the most important fields now in industry; it reaches into all areas of a corporation. How did we get to this state? I'd like to give you a brief history. So, in the 1990s, the World Wide Web was invented, and data started flowing then from machine to machine. And today we're in a highly distributed computing world. Most of our applications are remote and involve applications on multiple machines communicating with messages.

Now, in the early 2000s, the Internet of Things was invented. And again, we had lots of sensors and tags, devices talking to each other, and a tremendous amount of data started to be generated. And this data, if analyzed, could give insights into the operations of the company. Now, that tsunami of data needed another innovation to be able to handle it, and machine learning came along. And machine learning allows the machine to do a lot of the work for us, so we can automate many of the processes that we've done in the past, now can be automated by machines. At the same time, cloud computing came along. And this allowed us to develop a more stable environment that, in some sense there is a separation of concerns. Now, the cloud provides the machines, it patches those machines, it provides the stable environment into which we can inject our application images, usually into containers. So, today's applications are running in containers on multiple machines distributed widely and communicating via messages. So, in this course, we're going to introduce you to the ideas of distributed computing, of machines talking to other machines via messages. We're going to need to master security because, in the cloud, it's a zero-tolerance environment.

What that means is a machine will not talk to any other machine unless the message it receives has security tokens embedded in it. So, we're going to master this communication. We're going to master these pipelines that are now set up to handle data, and they handle data, not only in databases, but now we have brokers that can handle billions of events a day. We're going to take a look at Kafka, that again has a separation of concerns in that we have producers that are pushing data into Kafka. And then we have subscribers that are subscribing to that data, but the two are separated, and the broker handles this separation of concerns.

So, we're going to do a number of projects. The final one is going to be in real-time data, how we handle real-time streaming data. We're going to be handling big data. We're going to be handling multiple applications that communicate via messages. So, we're going to split up in one case, nine containers running different applications that are all communicating to provide these real-time data analytics. So, welcome to data engineering. We think this is one of the most important fields today in industry; it really forms the nervous system of a company. You can imagine your own nervous system without that; when external events occur, you have to react quickly, and your nervous system allows you to do that.

And it's the same in companies today, companies are driven by these events, and there are often cyber events, they're messages that they've sold a product, and now they need to produce that product and get it shipped to the right address. And so, we have these cascading events that are handled by multiple pipelines that are usually separated, and are loosely coupled, so the architectures that we are dealing with, you can imagine them as pipelines, and they're sending messages to each other, and that's the loose coupling. So, today's architectures are very different to the past. Running an application on your own machine is quite easy compared with what we're dealing with today in data engineering. So, welcome to the course. We think it's a great course. You're going to enjoy it, and you're going to get a sense of fulfillment when you complete it. Not only that, you're going to be a valuable asset to any corporation. So, look forward to seeing you on the course. Bye, for now.

Learn more

More Data Science and Analytics Courses from MIT xPRO

Other Courses From MIT xPRO