Call +60 3-2711 7241 Email:

HRDF Approved Training Provider in Malaysia - Modular Fast Track Skill-Based Trainings

Machine Learning with Apache Spark

Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine Learning algorithms comb through data and identify patterns that are too complex to be discerned by the human mind. These patterns can then be used for decision making and action

Apache Spark is a powerful platform that for running Machine Learning. This course will how you how to perforrm various Machine Learning using Apache Spark built in MLib component.

Topics include:

  • Overview of Apache Spark
  • Clustering
  • Regression
  • Classification
  • Recommendation


HRDF SBL Claimable for Employers Registered with HRDF

HRDF claimable

Course Code: M537

Course Booking


Course Date

Course Time

* Required Fields

Course Cancellation/Reschedule Policy

We reserve the right to cancel or re-schedule the course due to unforeseen circumstances. If the course is cancelled, we will refund 100% to participants.
Note the venue of the training is subject to changes due to class size and availability of the classroom.
Note the minimal class size to start a class is 3 Pax.

Course Details

Module 1: Apache Spark Basics

  • Recap of Apache Spark Basics 
  • Install Apache Spark on Local Computer
  • Read CSV Data
  • Manipulating Dataframe
  • ML Libraries

Module 2: Preprocessing

  • Normalizer
  • Standardizer
  • Tokenizer
  • TF-IDF

Module 3: Clustering

  • What is Clustering
  • Clustering Algorithms
  • KMeans Clustering
  • Hierarchical Clustering

Module 4: Classification

  • What is Classification
  • Naives Bayes Clasiifier
  • Decision Tree Classifer 
  • Multi Layer Perceptron

Module 5: Regression

  • What is Clustering
  • Clustering Algorithms
  • Linear Regression
  • Decision Tree Regression
  • Gradient Boosted Tree Regression

Module 6: ML Pipeline

  • What is Pipeline
  • Creating a Pipeline for Movie Review Classification

Module 7: Recommendation (Optional)

  • Recommendation Systems
  • Collaborative Filtering

Who Should Attend

  • Big Data Analysts
  • Data Scientists
  • Data Analysts



This is a intermediate course. Participants should have basic knowledge on the following subjects:

  • Python
  • Apache Spark

Software Requirement

Download and unzip Apache Spark


Apache Spark TrainerDr. Aanand is a Full Stack Data Scientist who once had a torrid love affair with Physics. He has consulted and published in the area of Public Health, Electricity Markets, Telecom, BFSI, Advertising & Communication Strategies and Digital & Social Media Technologies. He has worked on assignments with international agencies such as International Monetary Fund, World Bank, Royal Netherland Embassy etc. besides MNCs like Tata Consultancy Services, Kie Square Consulting and several government organizations of national importance. He regularly conducts general training programs in Python (Pandas, NumPy, SciPy, Matplotlib, Bokeh), R (dplyr, rstanarm, knitR, ggplot2), Data Visualization (Tableau, D3.js) and Machine Learning (Reinforced Learning, Scikit Learn) and specialized training programs on Structural Equation Modeling and SAP Hana. He holds a doctorate in Operations Research from Indian Institute of Management Ahmedabad and a post graduate in Physics from University of Mumbai. He has advanced training in mathematical programming including optimization, advanced multivariate data analysis, and simulation techniques. When he is not teaching or consulting he can be found meditating or heading for an adventurous trek.

Apache Spark TtrainerSyed Muhammad Farrukh Akhtar has more than 15 years of experience analysis, designing, developing, integrating and managing large applications for diverse industries. He has experience working in Dubai, Pakistan, Germany and Malaysia, strong hands-on experience of software design, development and integration on different platform like IBM J2EE, Oracle and Microsoft .Net, Big data, Hadoop, Spark, HBase, Hive, Sqoop, Flume and NoSQL. He also has expertise in Machine Learning/ Deep Learning with Tensor Flow, Keras and Python, excellent skills in React, Ionic 2, Angular 2, Mobile Apps with React Native and Node.js. He is highly knowledgeable in object oriented software development, requirements analysis, and database design. Possess deep understanding of Open Source technologies’ applicability in emerging business areas. He possesses excellent knowledge in Rational Unified Process (RUP); Rational Software Architect; data modeling and mapping; and extensible system design using the UML and Visio. Professional experience on J2EE, JMS, Web Sphere, Oracle, Spring, Hibernate, Struts and 3-Tier Web-based Applications Development.

Write Your Own Review

You're reviewing: Machine Learning with Apache Spark

How do you rate this product? *

  1 star 2 stars 3 stars 4 stars 5 stars
1. Do you find the course meet your expectation?
2. Do you find the trainer knowledgeable in this subject?
3. How do you find the training environment
  • Reload captcha


Use spaces to separate Subjects. Use single quotes (') for phrases.

You May Be Interested In These Courses