Call +60 3-2711 7241

HDRD Approved Training Provider in Malaysia - Learn New Skills to Enhance Your Employability from our HDRF Approved Courses

Apache Hadoop Big Data Training

Hadoop is indispensable when it comes to processing big data—as necessary to understanding your information as servers are to storing it. This 2 days crash course on Apache Hadoop Big Data training aims to give a good overview and familiarisation with Big Data tool sets such as Hadoop, MapReduce Pig, Hive,Impala, Sqoop, Oozie, Zookeeper Apache Sparks. It will explain Hadoop, its file system (HDFS), its processing engine (MapReduce) .

Topics include:

  • Understanding Hadoop core components: HDFS and MapReduce
  • Setting up your Hadoop development environment
  • Working with the Hadoop file system
  • Running and tracking Hadoop jobs
  • Tuning MapReduce
  • Understanding Hive and HBase
  • Exploring Pig tools
  • Building workflows
  • Using other libraries, such as Impala, Mahout, and Storm
  • Understanding Spark
  • Visualizing Hadoop output

HRDF SBL Claimable for Employers Registered with HRDF

HRDF claimable

Course Code: M448

Course Booking


Course Date

Course Time

* Required Fields

Course Cancellation/Reschedule Policy

We reserve the right to cancel or re-schedule the course due to unforeseen circumstances. If the course is cancelled, we will refund 100% to participants.
Note the venue of the training is subject to changes due to class size and availability of the classroom.
Note the minimal class size to start a class is 3 Pax.

Course Details


Module 1: Get Started on Apache Hadoop

  • Why Hadoop?
  • Differnece between HBase and Hadoop

Module 2: Hadoop Core Components

  • Java Virutal Machine (JVM)
  • HDFS
  • Hadoop Cluster Components
  • Exploring Hadoop Platforms

Module 3: Setup Hadoop Development Environment

  • Setup Cloudera Hadoop VM
  • Adding Hadoop LIbraries 
  • Programming Languages

Module 4: MapReduce  2.0/YARN

  • What is MapReduce?
  • MapReduce Components
  • MapReduce on HDFS

Module 5: Hive

  • What is Hive?
  • Hive Queries
  • Analyzing data with Hive

Day 2

Module 6: Pig

  • What is Pig
  • Pig Data types
  • Pig Commands

Module 7: Connectors and Workflows

  • Introducing Sqoop
  • Importing Data with Sqoop
  • Introuducing Flume
  • Importing Data with Sqoop
  • Introducing Zookeeper
  • Using Zookeeper to co-ordindate workflow
  • Introducing Oozie
  • Scheduling jobs using Oozie

Module 8: Exploring Other Hadoop Libraries

  • Introducing Impala
  • Introducing Mahout
  • Introduing Storm

Module 8: Apache Spark Basics

  • Why Apache Spark?
  • Apache Spark Components
  • Apache Spark Commmands

Who Should Attend

  • Data Scientists
  • Data Analyts
  • Hadoop Administrator
  • Big Data Analysts




Project Manager and Big Data TrainerTarun Sukhani is an IT executive, educator, author, speaker, data scientist, security expert, agile coach, polyglot coder, and entrepreneur with over 19 years of combined professional experience both in the U.S. and internationally. As a seasoned veteran, my expertise lies in leading teams in the design and delivery of highly scalable, concurrent, and performant enterprise software solutions with budgets of up to $100 million. I am particularly adept at building productive, self-managing agile teams with predictable velocities and delivery timeframes.

Tarun Sukhani is skilled in all phases of the SDLC/ALM, with a solid foundation in Agile (XP, SAFe, Lean, Scrum, Kanban, and Scrumban) and traditional (PMI and PRINCE2) project management frameworks and methodologies.

He is proficient in Big Data/Data Science: Hadoop, Pig, Hive, HBase, Spark, R/Rattle, Cassandra, YARN, Zookeeper, Mahout, SimpleCV, OpenCV

Big Data TrainerAjit is a certified Big data architect with 13 years of experience in the field of Business Data Analytics leading functions like Enterprise Data Warehouse Design, Development of BI Solutions around leading BI and Big Data Analytics platforms, IT Project and Service Management. Provided thought leadership in architecture design of Business Data Analytics solutions leveraging best practices and methodologies to implement Business Intelligence and Big Data solutions in corporate environments. Holds the credit of delivering breakthrough solutions in the areas of BI, Big Data Analytics - In-Memory Computing and Analytics to transform the Business performance of Fortune 500 Enterprises. Gained comprehensive hands-on implementation experience in the field of Big Data , SAP Analytics (BW & SAP HANA) and Business Objects Reporting Tools.

Actively involved in architecting the solution and implementation of high performance large volume data integration processes, database, storage, and other back-end services in fully virtualized environments. Certified Project Manager, Lead Auditor of ISO 22301/ ISO 27001/ ISO 20000/ ISO 9001, Agile SCRUM Master Certified practitioner with skills in managing the engineering resources optimally to get the best output with the minimum resources, using Agile Scrum methodology. Possess in-depth knowledge and experience in data modeling and business intelligence systems (dimensional modeling, data mining, predictive analytics). Strongly believe in facilitator approach to lead global cross-cultural teams and practices consultative approach in managing projects focused on implementing data warehousing and business intelligence solutions effectively and efficiently to meet today’s dynamic business environment.

Jason is a native of Kuala Lumpur, Malaysia; studied Bachelor’s Degree in Accounting and Finance from the London School of Economics Program, University of London. Raised in a typical Chinese family with entrepreneurial business background that is involved in manufacturing and real estate development. Worked as an Executive at the Asset and License Management Department in Standard Chartered, Malaysia; promoted to Data Analyst six months later. Later joined Tune Hotels Regional Services, a hotel management and hotel chain operator; served as Senior Revenue Executive. Served as Research Analyst with Wealth-X, a company that provides prospecting, intelligence and wealth due diligence on ultra-high net worth individuals. Thereafter served as Senior Data Analyst with Xchanging Malaysia, a joint venture between Xchanging and YTL Communications to develop and deliver enhanced mobile internet and cloud-based hosting offerings in Malaysia. Currently working as a Data Analyst with GoQuO, a full service e-commerce solutions provider to airlines and OTAs. Community Organizer of Big Data Malaysia, a professional network for individuals with interest in all aspects of Big Data, and Member of the Founder Institute for Malaysian Chapter, the world’s largest entrepreneur training and startup launch program. Occasionally participates in marathons and is an avid off-road cyclist. Passionate about technology, economics and enjoys social events.

Write Your Own Review

You're reviewing: Apache Hadoop Big Data Training

How do you rate this product? *

  1 star 2 stars 3 stars 4 stars 5 stars
1. Do you find the course meet your expectation?
2. Do you find the trainer knowledgeable in this subject?
3. How do you find the training environment
  • Reload captcha


Use spaces to separate Subjects. Use single quotes (') for phrases.