Data Science

by Sazan Consulting Inc. Claim Listing

The major focus of the course is to develop and improve skillsets at the industry level to work in the data science and data engineering domains. Less theories, more exercise, and projects

Price : Enquire Now

Contact the Institutes

Fill this form

Advertisement

Sazan Consulting Inc. Logo

img Duration

40 Hours

Course Details

The major focus of the course is to develop and improve skillsets at the industry level to work in the data science and data engineering domains. Less theories, more exercise, and projects

 

Program Outline

  • Data preparation

    • Working with data pipeline and preparing data using technologies like Spark, Streaming, Kafka, Hadoop, HDFS, HQL, SQL and cloud technologies like DataProc, Cloud Storage, and Big Query

    • Dealing with various data formats, from unstructured to structured, and data types like JSON, XML and logs

  • Applied Data Science

    • Achieve skill set to deal with data science use cases through widely used programming languages, data structures, tools, libraries, and algorithms, including cloud technologies

 

Session 1 [1]: HDFS, Map Reduce, and Hive

  • Introduction to Big data and Hadoop Ecosystem

  • Why does industry need big data? Advantages of Big Data over traditional RDBMS

  • Introduction to Big Data Data Ecosystems

  • Understanding Data various formats and transformation techniques

  • HDFS, YARN architecture

  • Understanding HDFS, Hadoop and Hive

  • Hive advanced features for performance

  • Project -1

 

Session 2 [.5]: Impala, Oozie, Shell Scripting, Linux

  • Usages of Shell scripting in big data projects

  • Shell for combining various Hadoop technologies

  • Architecture of Impala

  • Usages of Hive and Impala in Real-Life life project

  • Understanding Oozie as scheduler, Oozie Coordinator

  • Introduction to Sqoop

  • Understanding capabilities of Sqoop and underlying MapReduce

  • Use Sqoop to ingest data from traditional database to HDFS or Hive

  • Project-2

 

Session 3 [.5]: Spark, Scala

  • Introduction to Scala programming language

  • Scala from functional perspective

  • Scala features for big data transformations

  • Spark, fastest data processing engine in the world

  • Spark architecture

  • Deep drive spark data transformation capabilities

  • Spark SQL with HDFS, Hive and Impala

  • Dealing with various data types JSON, XML, CSV, parquet, text

  • Project-3

 

Session 4 [.5]: Streaming: Kafka and Spark streaming

  • Introduction to streaming and the new era of data analytics

  • Introduction Kafka

  • The deep drive of Kafka architecture

  • Setup up Kafka for message generation

  • Spark Streaming with Kafka

  • Kafka performance Tuning considerations

  • Consideration for Zero data loss streaming pipelines

  • Dealing with small file issue and compaction

  • project 4

 

Session 5(1.5) : Big data in Cloud: GCP (Google Cloud Platform)

  • GCP Orchestration

  • GCP tools and services used in Canadian companies to develop data pipelines

  • GCP tools Services used in Canadian companies to prepare data for ML models

  • Hadoop on GCP

  • Cloud Migration: Lift and shift strategy with minimum code changes

  • Streaming on the cloud

  • Project -5

 

Session 6(1.5): Introduction to Data Science and Python:

  • Basics of Data Science

    • How data science differs from other use cases

  • Python for data science

    • Using Anaconda, Jupiter node book, Docker, and VS Code

  • Apply Python libraries for

    • Plotting

    • Data Science: Scikit-learn, Pandas, Numpy

  • Applying data Science and its usages in real-life usage cases

  • Going over various data science Algorithms (supervised and unsupervised)

  • Understanding Features engineering, training and testing models

  • Data preparation for training model

  • Project-6

 

Session 7(2): Google Cloud for Data Science

  • GCP tools and services for data science

  • Vartex AI, Work bench

  • Model garden

  • BQML

  • AutoML

  • Generative AI

  • Project 7

 

Session 8(.5): Docker

  • Understanding Microservices

  • Introduction to Docker and its usages

  • Docker installation and configuration

  • Understanding and working with container

  • Inter-container communication exposes services through port

  • Understanding docker file

  • Container-based deployment

  • Docker compose

  • Introduction to Kubernetes

  • Introduction to Helmchart

  • Deployment of Docker images to kubernetes using Helm Chart

  • Managing PODs

Project-8: Create data science environment using microservices

 

Final Project

Pre-requisites for the program:

  • Familiarize yourself with programming languages like Python, C, and C++.

  • Familiarity with RDBMS and SQL

  • It's nice to have Linux and shell scripting knowledge

  • Must be available for 8 hours class per week and at least 2 hours a day for learning and projects beyond class hours

  • Toronto Branch

    421 Nugget Av. Unit 4 Toronto ON M1S 4L8, Toronto

Check out more Data Science courses in Canada

SimpliAxis Logo

Data Science Certification Training

Many organizations around the world are looking for Data Scientists therefore who have certification in Data Science have a great demand. This course provides the candidates with the skills of data analysis and data-driven decision-making to bring the right results within the enterprise.

by SimpliAxis [Claim Listing ]
DPA Communications Logo

Microsoft Power BI

Power BI is a collection of software services, apps, and connectors that work together to turn your unrelated sources of data into coherent, visually immersive, and interactive insights.

by DPA Communications [Claim Listing ]
SysIntelligence Institute of Technology Logo

Machine Learning

The course starts with an overview of Azure services that support data science. From there, it focuses on using Azure's premier data science service, the Azure Machine Learning service, to automate the data science pipeline.

by SysIntelligence Institute of Technology [Claim Listing ]
Brain Station Logo

Data Science

BrainStation’s Data Science course was created to help you develop job-ready data skills. Earn a Data Science certificate while learning the foundations of data science, how to create dynamic data visualizations, data modeling, machine learning techniques, Python for data analysis, and more.

by Brain Station [Claim Listing ]
Edtia Logo

Tableau Training and Certification

Master Tableau by enrolling in EDTIA's Tableau certification training course and upskilling your knowledge and technological skills in the industry.

by Edtia [Claim Listing ]

© 2024 coursetakers.com All Rights Reserved. Terms and Conditions of use | Privacy Policy