Course Details
Module 2: Exploring Data
- Data Interface
- RDD Basic Operations
- Import Data
- Actions and Transformations
- Saving Results
Module 3: Analyzing Data
- Select and FIlter Data
- Aggregate Data
- Save Data
Module 4: SparkSQL
- Creating Tables
- Querying Data
- Visualizing Data
Module 5: Machine Learning
- ML or MLlib Module
- Preprocessing Data
- Linear Regression
- Classification
Module 6: Spark Streaming
- Streaming Setup
- Querying Streaming Data
Course Admin
Prerequisite
This is a intermediate course. Participants should have basic knowledge on the following subjects:
- Python
- Apache Spark
Software Requirement
Download and unzip Apache Spark https://spark.apache.org/downloads.html.
Who Should Attend
- Data Scientists
- Data Analysts
- Apache Spark developers who want to use Apache Spark for Hadoop Big Data analysis
Trainers
Dr Atabak has Total of 15 years of experience in software development/architecture and system integration. Broad experience in commercial software architecture and development. Experience in all stages of the software development lifecycle, high performance, and high-availability secure reliable systems. Experienced team and project lead of 10-15 through SDLC iterations, Agile and eXtreme Programming practitioner mentored fellow developers on the various aspects of application architecture and development, work well with customers. Architects solutions in agile and scrum development environments. Deep understanding of technology with a focus on delivering business solutions. Externalizes configuration and business logic to ease client software implementations. Expertise in full project life cycle development including implementation and integration. Successful background working with stakeholders to develop an architecture framework that aligns strategy, processes, and IT assets with a business goal. Work closely with project managers, developers, and focus groups to avoid redundancy, minimize expenditures, and improve overall synergy within the organization.