Sail(Data Engineer)
Data engineers lay the foundation for the cloud, designing and building data warehouses.
Course Description:
Learners will gain knowledge and develop hands-on experience solving real-world problems in the field of data engineering. This includes ingesting, egressing and transforming data from multiple sources using various technologies, services and tools. Learners will develop the skills needed to identify and meet data requirements of an organization by designing and implementing systems and data pipelines that manage, monitor and secure the data using the full stack of cloud services to satisfy business needs.
Learners will explore and experiment with various storage abstractions such as SQL and NoSQL databases, data lakes and data warehouses to store, transform and draw insights from data. Learners will utilise MapReduce and Spark frameworks and provision clusters on public cloud infrastructure to clean, prepare and analyze large datasets. In addition, learners will architect and develop different analytics applications and pipelines using batch, iterative and stream processing frameworks.
All projects in the course utilize existing data storage and data processing technologies and services available on cloud. Specifically, learners will be exposed to real-world scenarios, infrastructure, and data. It is our goal that learners will develop the required skills needed to become a data engineer.
Prerequisites:
- Cloud Administrator
- Introduction to Computing
Duration:
- 8 weeks per quarter
- 15 weeks per semester
Learning Objectives
Learners who complete the Data Engineer course should be able to:
- Identify and suggest appropriate storage abstractions to store, transform and draw insights from data;
- Architect, orchestrate and deploy a variety of analytics applications and pipelines using batch and stream processing frameworks.
- Identify and construct useful features using well-known feature engineering methods to improve the accuracy of ML based predictors.
- Implement and automate data pipelines using cloud-based data lakes, data warehouses and workflow orchestration services to manage, monitor and secure data.