The major focus of the course is to develop and improve skillsets at the industry level to work in the data science and data engineering domains. Less theories, more exercise, and projects
The major focus of the course is to develop and improve skillsets at the industry level to work in the data science and data engineering domains. Less theories, more exercise, and projects
Program Outline
Data preparation
Working with data pipeline and preparing data using technologies like Spark, Streaming, Kafka, Hadoop, HDFS, HQL, SQL and cloud technologies like DataProc, Cloud Storage, and Big Query
Dealing with various data formats, from unstructured to structured, and data types like JSON, XML and logs
Applied Data Science
Achieve skill set to deal with data science use cases through widely used programming languages, data structures, tools, libraries, and algorithms, including cloud technologies
Session 1 [1]: HDFS, Map Reduce, and Hive
Introduction to Big data and Hadoop Ecosystem
Why does industry need big data? Advantages of Big Data over traditional RDBMS
Introduction to Big Data Data Ecosystems
Understanding Data various formats and transformation techniques
HDFS, YARN architecture
Understanding HDFS, Hadoop and Hive
Hive advanced features for performance
Project -1
Session 2 [.5]: Impala, Oozie, Shell Scripting, Linux
Usages of Shell scripting in big data projects
Shell for combining various Hadoop technologies
Architecture of Impala
Usages of Hive and Impala in Real-Life life project
Understanding Oozie as scheduler, Oozie Coordinator
Introduction to Sqoop
Understanding capabilities of Sqoop and underlying MapReduce
Use Sqoop to ingest data from traditional database to HDFS or Hive
Project-2
Session 3 [.5]: Spark, Scala
Introduction to Scala programming language
Scala from functional perspective
Scala features for big data transformations
Spark, fastest data processing engine in the world
Spark architecture
Deep drive spark data transformation capabilities
Spark SQL with HDFS, Hive and Impala
Dealing with various data types JSON, XML, CSV, parquet, text
Project-3
Session 4 [.5]: Streaming: Kafka and Spark streaming
Introduction to streaming and the new era of data analytics
Introduction Kafka
The deep drive of Kafka architecture
Setup up Kafka for message generation
Spark Streaming with Kafka
Kafka performance Tuning considerations
Consideration for Zero data loss streaming pipelines
Dealing with small file issue and compaction
project 4
Session 5(1.5) : Big data in Cloud: GCP (Google Cloud Platform)
GCP Orchestration
GCP tools and services used in Canadian companies to develop data pipelines
GCP tools Services used in Canadian companies to prepare data for ML models
Hadoop on GCP
Cloud Migration: Lift and shift strategy with minimum code changes
Streaming on the cloud
Project -5
Session 6(1.5): Introduction to Data Science and Python:
Basics of Data Science
How data science differs from other use cases
Python for data science
Using Anaconda, Jupiter node book, Docker, and VS Code
Apply Python libraries for
Plotting
Data Science: Scikit-learn, Pandas, Numpy
Applying data Science and its usages in real-life usage cases
Going over various data science Algorithms (supervised and unsupervised)
Understanding Features engineering, training and testing models
Data preparation for training model
Project-6
Session 7(2): Google Cloud for Data Science
GCP tools and services for data science
Vartex AI, Work bench
Model garden
BQML
AutoML
Generative AI
Project 7
Session 8(.5): Docker
Understanding Microservices
Introduction to Docker and its usages
Docker installation and configuration
Understanding and working with container
Inter-container communication exposes services through port
Understanding docker file
Container-based deployment
Docker compose
Introduction to Kubernetes
Introduction to Helmchart
Deployment of Docker images to kubernetes using Helm Chart
Managing PODs
Project-8: Create data science environment using microservices
Final Project
Pre-requisites for the program:
Familiarize yourself with programming languages like Python, C, and C++.
Familiarity with RDBMS and SQL
It's nice to have Linux and shell scripting knowledge
Must be available for 8 hours class per week and at least 2 hours a day for learning and projects beyond class hours
Sazan Consulting helps you find a career that fits your passion. Building a successful career is a process, and it does not happen overnight.
Sazan has been delivering job-oriented, comprehensive training programs to candidates for more than ten years. Sazan possesses the expertise and resources to deliver training as per the highest quality standards.
These training programs range from 6 to 8 weeks and are conducted by industry experts working for reputable North American companies. Trainers have excellent knowledge of different domain industries. That is why the classes are conducted only on weekends and are more focused on real-time projects.
The different training programs that are offered by Sazan are:
Business Analysis
Business Intelligence
Big Data/Data Science
Full Stack
AWS DevOps
Cyber Security
System Administration and AWS Infrastructure
Features
Industry Expert Trainers
Weekend Classes Only
Resume and Interview Preparation
Interactive Sessions
On-the-Job Support
Our guidance and training help job seekers nourish their skills and build great careers by choosing the right programs that suit the candidate’s interests and skill set.
Recently, Sazan Consulting was recognized by the Canadian Business Review Board (CBRB) and awarded one of Canada’s best businesses for its customer satisfaction, outstanding service, business leadership, and strong vision.
Currently, data analytics is one of the most promising and in-demand professions, with applications in a multitude of industries. Businesses are recognizing the many advantages of collecting and analyzing data (such as business forecasting, optimized customer experience, and mitigating risk).
Delve further into Power Query to ETL (Extract, Transform and Load) your data. Build the Data Model using modeling features and relationships. Perform calculations using DAX (Data Analysis Expressions) functions.
This course is designed for experienced programmers and those with a solid working knowledge of computing technology looking to gain the skills needed to successfully use these key libraries to extract useful insights from data, and as a result, provide great value to the business.
They are aimed at people who are already on the information technology job market as well as IT graduate students who want to continue their training.
Data Science & Databases courses are offered by the Software Training Academy. Software Training Academy takes corporate training and learning software in general very seriously.
© 2024 coursetakers.com All Rights Reserved. Terms and Conditions of use | Privacy Policy