Big Data Training

Branta Group offers comprehensive Big Data Training for Hadoop and NoSQL Database both at our training centers and on-site with clients. We help candidates and client update their skills and help them master open-source technologies. Course content is drawn from real-world experience, and graduates of our developer programs learn techniques for solving real-world challenges, using the latest versions of powerful Apache Foundation software as well as Hadoop, Java and related web technologies.

Our classes are small in order to provide targeted and effective training. We provide daytime and weekend classes.

If you have a small number of people who need to be trained, our trainers will set up and conduct the training program on-site.

Hadoop developer training in SF Bay Area

Intensive Developer Training Classes (most popular 2-weekend, 4-day program)

Please note that the course can be altered to fit the time and cost for each client.

1. Introduction

  • Why Hadoop?
  • Problems with traditional large-scale systems
  • Requirements for a new approach
  • What is Hadoop?
  • History of Hadoop
  • Building Blocks – Hadoop Architecture and Eco-System
  • Who is behind Hadoop?
  • What Hadoop is good for and why?


  • An Overview of HDFS
  • HDFS Blocks, HDFS layers, NameNode, DataNode, Secondary Name Node
  • HDFS High Available, HDFS Federation, HDFS Write Path, HDFS Read Path
  • HDFS shell and commands
  • Configuring HDFS
  • Interacting With HDFS
  • HDFS Permissions and Security
  • Additional HDFS Tasks
  • HDFS Overview and Architecture
  • HDFS Installation
  • Hadoop File System Shell
  • File System Java API

3. MapReduce

  • Map/Reduce Overview and Architecture
  • Installation
  • Developing Map/Red Jobs
  • Map Reduce Daemons
  • MR v2: YARN
  • The Mapper
  • Reusable Mappers
  • The Reducer
  • Reusable Reducers
  • Running a Map Reduce Job
  • Getting Data to the Mapper
  • Input and Output Formats
  • Job Configuration
  • Job Submission
  • Practicing Map Reduce Programs (atleast 10 Map Reduce Algorithms )

4. Getting Started With Eclipse IDE

  • Configuring Hadoop API on Eclipse IDE
  • Connecting Eclipse IDE to HDFS

5. Hadoop Streaming

6. Advanced Map Reduce Features

  • The Driver
  • Tool Runner
  • Creating a new Job object
  • Mapper Code
  • Reducer Code
  • Number of Reducers
  • Scheduling a Job
  • Custom Data Types
  • Input Formats
  • Output Formats
  • Partitioning Data
  • Reporting Custom Metrics
  • Distributing Auxiliary Job Data

7. Features and Optimizations

  • Combiner
  • Shuffle and Sort
  • Speculative Execution
  • Partitioner
  • Counters
  • Map-only Job
  • Joining Datasets: Map-side joins
  • Joining Datasets: Reduce-side joins

8. Distributing Debug Scripts

9. Using Yahoo Web Services

10. Pig

  • Pig Overview
  • Installation
  • Pig Latin
  • Pig with HDFS

11. Hive

  • Hive Overview
  • Installation
  • Hive QL
  • Hive Unstructured Data Analyzation
  • Hive Semi structured Data Analyzation

12. HBase

  • HBase Overview and Architecture
  • HBase Installation
  • HBase Shell
  • CRUD operations
  • Scanning and Batching
  • Filters
  • HBase Key Design

13. ZooKeeper

  • ZooKeeper Overview
  • Installation
  • Server Maintenance

14. Sqoop

  • Sqoop Overview
  • Installation
  • Imports and Exports

15. Configuration

  • Basic Setup
  • Important Directories
  • Selecting Machines
  • Cluster Configurations
  • Small Clusters: 2-10 Nodes
  • Medium Clusters: 10-40 Nodes
  • Large Clusters: Multiple Racks

16. Integrations

17. Putting it all together

  • Distributed installations
  • Best Practices

Standard Hadoop Training:

  • Intro To Hadoop and NoSQL Technologies (Course 101)
  • Hadoop Essential System Administration (Course 102)
  • Hadoop Training for Developers (Couse 103)
  • Hive and Pig for Developers (Course 104)

NoSQL Training:

  • Introduction to HBASE Administration and Development (Course 105)
  • Essential Cassandra Training for Admins and Developers (Course 106)
  • Riak Training (Course 1007)
  • Mongo DB Administration and Developments (Course 107)
  • Neo4j An Introduction to Graph Databases (Course 108)