Data Science Online Training

SequelGate Technologies offers best Data Science Training with impeccable syllabus with unmatched curriculum and course plan. The instructors for this course are very much well versed with Data Science and Big Data Analytics - very energetic and caring to ensure proper learning process with streamlined course proceedings. We are truly aware of the industry needs and hence offering the best

This course includes : Hadoop, R Studio, Python, Scala, Machine Learning, Tableau, Excel and more..!
Register Today

Regular Schedules for DataScience Training: Mon - Fri

Demo Date
Start Date
1 7 AM to 8:30 AM Nov 1st Nov 2nd Register
2 10:30 AM to 12 PM Nov 15th Nov 16th Register
3 5 PM to 6:30 PM Nov 1st Nov 2nd Register

Weekend Schedules for DataScience Training: Sat, Sun

1 7 AM to 10 AM Oct 14th Oct 15th Register

Trainer : Mr G Mishra (8+ Yrs Exp)

Total Course Fee : 40,000/- [USD 700]

Course Fee payable in two equal installments


Daily Tasks Weekly Interviews
Real-time Case Studies Resume Guidance
Certification Guidance Placement Services

Data Science Training Course Contents:


  • What is Cloud Computing
  • What is Grid Computing
  • What is Virtualization
  • How above three are inter-related to each other
  • What is Big Data
  • Introduction to Analytics and the need for big data analytics
  • Hadoop Solutions - Big Picture
  • Hadoop distributions
  • Comparing Hadoop Vs. Traditional systems
  • Volunteer Computing
  • Data Retrieval - Radom Access Vs. Sequential Access
  • NoSQL Databases

The Motivation for Hadoop

  • Problems with traditional large-scale systems
  • Data Storage literature survey
  • Data Processing literature Survey
  • Network Constraints
  • Requirements for a new approach

Hadoop: Basic Concepts

  • What is Hadoop?
  • The Hadoop Distributed File System
  • How MapReduce Works
  • Anatomy of a Hadoop Cluster

Hadoop demons

  • Master Daemons
  • Name node
  • Job Tracker
  • Secondary name node
  • Slave Daemons
  • Job tracker
  • Task tracker

HDFS (Hadoop Distributed File System)

  • Blocks and Splits
  • Input Splits
  • HDFS Splits
  • Data Replication
  • Hadoop Rack Aware
  • Data high availability
  • Data Integrity
  • Cluster architecture and block placement
  • Accessing HDFS
  • JAVA Approach
  • CLI Approach

Programming Practices & Performance Tuning

  • Developing MapReduce Programs in
  • Local Mode
  • Running without HDFS and Mapreduce
  • Pseudo-distributed Mode
  • Running all daemons in a single node
  • Fully distributed mode
  • Running daemons on dedicated nodes

Hadoop Adminstrative Tasks:

Setup Hadoop cluster of Apache, Cloudera and HortonWorks

  • Install and configure Apache Hadoop
  • Make a fully distributed Hadoop cluster on a single laptop/desktop (Psuedo Mode)
  • Install and configure Cloudera

Hadoop distribution in fully distributed mode

  • Install and configure HortonWorks

Hadoop distribution in fully distributed mode

  • Monitoring the cluster
  • Getting used to management console of

Cloudera and Horton Works

  • Name Node in Safe mode
  • Meta Data Backup
  • Integrating Kerberos security in hadoop
  • Ganglia and Nagios – Cluster monitoring
  • Benchmarking the Cluster
  • Commissioning/Decommissioning Nodes

Hadoop Developer Tasks:

Writing a MapReduce Program

  • Examining a Sample MapReduce Program
  • With Several Examples
  • Basic API Concepts
  • The Driver Code
  • The Mapper
  • The Reducer
  • Hadoop's Streaming API

Performing several Hadoop Jobs

  • The configure and close Methods
  • Sequence Files
  • Record Reader
  • Record Writer
  • Role of Reporter
  • Output Collector
  • Processing video files and audio files
  • Processing image files
  • Processing XML files
  • Processing Zip files
  • Counters
  • Directly Accessing HDFS
  • ToolRunner
  • Using The Distributed Cache

Common MapReduce Algorithms

  • Sorting and Searching
  • Indexing
  • Classification/Machine Learning
  • Term Frequency - Inverse Document Frequency
  • Word Co-Occurrence
  • Hands-On Exercise: Creating an Inverted Index
  • Identify Mapper
  • Identify Reducer
  • Exploring well known problems using
  • MapReduce applications

Debugging MapReduce Programs

  • Testing with MRUnit
  • Logging
  • Other Debugging Strategies

Advanced MapReduce Programming

  • A Recap of the MapReduce Flow
  • Custom Writables and WritableComparables
  • The Secondary Sort
  • Creating InputFormats and OutputFormats
  • Pipelining Jobs With Oozie
  • Map-Side Joins
  • Reduce-Side Joins

Monitoring and debugging on a Production Cluster

  • Counters
  • Skipping Bad Records
  • Rerunning failed tasks with Isolation Runner

Tuning for Performance

  • Reducing network traffic with combiner
  • Reducing the amount of input data
  • Using Compression
  • Running with speculative execution
  • Refactoring code and rewriting algorithms Parameters affecting Performance
  • Other Performance Aspects

Hadoop Ecosystem


  • Hive concepts
  • Hive architecture
  • Install and configure hive on cluster
  • Create database, access it console
  • Buckets,Partitions
  • Joins in Hive
  • Inner joins
  • Outer joins
  • Hive UDF
  • Hive UDAF
  • Hive UDTF
  • Develop and run sample applications in Java to access hive
  • Load Data into Hive and process it using Hive


  • Pig basics
  • Install and configure PIG on a cluster
  • PIG Vs MapReduce and SQL
  • PIG Vs Hive
  • Write sample Pig Latin scripts
  • Modes of running PIG
  • Running in Grunt shell
  • Programming in Eclipse
  • Running as Java program
  • PIG UDFs
  • PIG Macros
  • Load data into Pig and process it using Pig


  • Install and configure Sqoop on cluster
  • Connecting to RDBMS
  • Installing Mysql
  • Import data from Oracle/Mysql to hive
  • Export data to Oracle/Mysql
  • Internal mechanism of import/export
  • Import millions of records into HDFS from RDBMS using Sqoop


  • HBase concepts
  • HBase architecture
  • Region server architecture
  • File storage architecture
  • HBase basics
  • Cloumn access
  • Scans
  • HBase Use Cases
  • Install and configure HBase on cluster
  • Create database, Develop and run sample applications
  • Access data stored in HBase using clients like Java
  • Map Resuce client to access the HBase data
  • HBase and Hive Integration
  • HBase admin tasks
  • Defining Schema and basic operation


  • Cassandra core concepts
  • Install and configure Cassandra on cluster
  • Create database, tables and access it console
  • Developing applications to access data in Cassandra through Java
  • Install and Configure OpsCenter to access Cassandra data using browser


  • Oozie architecture
  • XML file specifications
  • Install and configure Oozie on cluster
  • Specifying Work flow
  • Action nodes
  • Control nodes
  • Oozie job coordinator
  • Accessing Oozie jobs command line and using web console
  • Create a sample workflows in oozie and run them on cluster

Zookeeper,Flume,Chukwa,Avro,Scribe,Thrift,HCatal og

  • Flume and Chukwa Concepts
  • Use cases of Thrift ,Avro and scribe
  • Install and Configure flume on cluster
  • Create a sample application to capture logs from Apache using flume

Analytics Basics

  • Analytics and big data analytics
  • Commonly used analytics algorithms
  • Analytics tools like R and Weka
  • R language basics
  • Mahout

CDH4 Enhancements

  • Name Node High – Availability
  • Name Node federation
  • Fencing
  • YARn

Scala Introduction &Environment Setup:

  • Scala is object-oriented, Scala is functional,Scala runs on the JVM
  • Installing Scala

Scala Basic Syntax

  • First Scala Program
  • Interactive Mode Programming
  • Script Mode Programming

Scala Data TYPES:

  • Literals
  • Strings
  • Escape Sequences

Scala Variables:

  • Declaration
  • Data Types
  • Type Inference
  • Multiple assignments
  • Variable Types

Scala Operators:

  • Arithmetic
  • Relational
  • Logical
  • Operator Precedence in Scala
  • Scala Conditions

  • Scala Loops

  • Scala Strings:

Scala Regular Expressions:

  • Forming regular expressions
  • Matching Literals and Constants
  • Matching Tuples and Lists
  • Matching with Types and Guards
  • Pattern Variables and Constants in case Expressions
  • Regular-expression Examples
  • Pattern matching with Extractors

Scala Functions:

  • Declarations
  • Definitions
  • Calling 
  • Function Literals
  • Anonymous
  • Currying

Scala Arrays

  • Declaring
  • Processing
  • Multi-Dimensional
  • Create Array with Range
  • Scala Arrays Methods

Scala Collections

  • Basic Operations on List,
  • Concatenating Lists
  • Creating Uniform Lists
  • Tabulating a Function
  • Scala List Methods
  • Concatenating Sets, Find max, min elements in Set
  • Find common values in Sets
  • Scala Set Methods
  • Basic Operations on Map
  • Check for a Key in Map

Scala Classes & Objects:

  • Oops Basics
  • Defining Fields,Methods,Constructors

Introduction to Apache Spark:

  • What is Spark?
  • Spark Ecosystem, &modes of Spark
  • overview of Spark on a cluster
  • Spark Standalone cluster
  • Spark Web UI &
  • Spark Common Operations

Spark Core

  • performing basic Operations on files in Spark Shell and Overview of SBT
  • building a Spark project with SBT
  • running Spark project with SBT
  • Playing with RDDs:
  • RDDs, transformations in RDD, actions in RDD
  • loading data in RDD
  • saving data through RDD
  • Key-Value Pair RDD
  • MapReduce and Pair RDD Operations
  • Spark and Hadoop Integration-Yarn

Spark SQL

  • SparkSQL and Performance Tuning in Spark:
  • Analyze Hive and Spark SQL architecture, SQLContext in Spark SQL
  • working with Data Frames
  • implementing an example for Spark SQL
  • integrating hive and Spark SQL
  • support for JSON and Parquet File Formats
  • implement data visualization in Spark
  • loading of data
  • Hive queries through Spark
  • performance tuning tips in Spark

Spark Streaming

  • A Simple Example
  • Architecture and Abstraction
  • Transformations
  • Stateless Transformations
  • Stateful Transformations
  • Output Operations
  • Input Sources
  • Additional Sources
  • Multiple Sources and Cluster Sizing
  • Worker Fault Tolerance
  • Receiver Fault Tolerance
  • Processing Guarantees
  • Streaming UI
  • Batch and Window Sizes
  • Level of Parallelism

Spark GraphX

  • Edges
  • Vertices
  • Types of Graphs
  • Usages
  • Simple Program


  • Vectors
  • Labledpoints
  • Lables
  • Features
  • RDD with Vectors
  • Matrices, Stats, Maths
  • Algorithms with Spark Mlib


  • Introduction to Advanced Data Analytics
  • Statistical descriptive and inferences for various Business problems
  • Types of Variables
  • Measures of central tendency
  • Dispersion
  • Variable Distributions
  • Probability Distributions
  • Normal Distribution and Properties
  • Skewness and Kurtosis
  • Five number Summary Analysis


  • Null/Alternative Hypothesis formulation
  • Type I and Type II errors
  • One Sample T-TEST
  • Independent Sample T-TEST
  • Analysis of Variance ( ANOVA)
  • Chi Square Test (Non Parametric Tests)

Data quality and outlier treatment

  • Outlier treatment with robust measurements
  • Outlier treatment with central tendency Mean
  • Outlier with Min Max methods
  • Imputation with series means or median values
  • Z score Calculation
  • Sampling and estimation

Data Visualization

  • Stem and leaf
  • Dot Plot
  • Histogram
  • Density Plot
  • Frequency Plot and

Spark GraphX

  • Edges
  • Vertices
  • Types of Graphs
  • Usages
  • Simple Program

Cumulative Frequency plots

  • Box and Whisker Plot
  • Scatter Plot
  • Line Graph
  • Bar Graph
  • Pie Chart
  • Tree Map
  • Cross Tabulation
  • Case Study for Visualization

Data Quality checking

  • Z score Calculation
  • Measure of position (percentile and Quartiles)
  • Measure of asymmetry --Skewness
  • Measure of Peaked-ness --Kurtosis
  • Q-Q probability plots
  • Kolmogorov Smirnov test
  • Shapiro Wilks test
  • Data Normalization
  • Handling missing Values
  • Case Studies for Data Quality Checking

Getting Started R

  • R Basics
  • Variables and Class
  • Vectors, List, Factors, Matrix
  • Data Frames
  • Missing Values
  • Data Reading and Writing data
  • Data Visualization using GGPLOT
  • If-Else Conditions
  • Function
  • Loops
  • Data manipulation
  • Python

  • Python Basics

  • Python Lists

  • Functions and Packages

  • Numpy

  • Control flow and Pandas


  • Counting Combinations, Generating Combinations
  • Generating Random Numbers
  • Generating Reproducible Random Numbers
  • Generating a Random Sample
  • Generating Random Sequences
  • Randomly Permuting a Vector
  • Probabilities for Discrete Distributions
  • Probabilities for Continuous Distributions, Converting
  • Probabilities to Quantiles, Plotting a Density Function


  • Edges
  • Vertices
  • Graphs
  • Programs

Machine Learning

  • Introduction to Machine Learning
  • Types Of Machine Learning
  • Real time use cases in Machine Learning
  • Types of Algorithms Types of Problems –
    • Regression
    • Classification
    • Clustering
    • Collaborative Filtering
    • Optimization
    • Prediction
  • Regression –
    • Linear Regression
    • Logistic Regression
  • Classification –
    • Logistic Regression
    • Decision Tree,Random Forest
    • KNN,SVM
    • Naive ayes
  • Clustering –
    • K-means Clustering
Complete Practical Training with Real-time Databases. Course includes Real-time Case Studies. Register Today
All Classes are Instructor-Led & LIVE. Completely Practical and Real-time with Study Material, Session Notes, Tasks and 24x7 LIVE Server.
Register Today  Other Popular Courses: SQL DBA Training, MSBI Training, SSIS Training, SSAS Training, SSRS Training [+] More Courses

Job-Oriented Real-time Training @ SQL School Training Institute - Trainer: Mr. Sai Phanindra T