Call +60 3-2711 7241 Email:

HRDF Approved Training Provider in Malaysia - Modular Fast Track Skill-Based Trainings

Text Mining with R

It is estimated that over 70% of potentially useable business information is unstructured, often in the form of text data. Text mining provides a collection of techniques that allow us to derive actionable insights from these data.

This course will show you the various tools and major techniques for mining and analyzing text data to discover interesting patterns, extract useful knowledge, and support decision making, with an emphasis on statistical approaches, to making sense of unstructured data. Work with a live example of extraction of data from Web and perform all the facets of text mining using R.

The topics include:

  • Sentiment analysis
  • Word cloud
  • Ngrams
  • Topics Modeling
  • LDA
  • Extracting text from social media

HRDF SBL Claimable for Employers Registered with HRDF

HRDF claimable

Course Code: M587

Course Booking

MYR880.00 (GST-exclusive)

Course Date

Course Time

* Required Fields

Course Cancellation/Reschedule Policy

We reserve the right to cancel or re-schedule the course due to unforeseen circumstances. If the course is cancelled, we will refund 100% to participants.
Note the venue of the training is subject to changes due to class size and availability of the classroom.
Note the minimal class size to start a class is 3 Pax.

Course Details

Module 1: Introduction

  • What is text mining
  • Applications of text mining

Module 2: Basic Text Functions

  • Text manipulation functions
  • Working with strings
  • Working with gsub
  • Advanced methods
  • Convert to corpus

Module 3: Importing Data

  • Converting docx into corpus
  • Converting pdf into corpus
  • Converting html to corpus
  • Web scraping

Module 4: Tidytext Package

  • Tidying text objects
  • Tidying document term matrix objects
  • Tidying document frequency matrix objects
  • Tidying corpus objects
  • Mining literacy works

Module 5: Word Frequencies & Relationships

  • Pre-processing text
  • Wordcloud
  • Frequency analysis
  • nGrams & bigrams
  • Bigrams for sentiment analysis
  • Visualizing bigrams network

Module 6: Sentiment Analysis

  • Sentiment libraries
  • Analyzing positive & negative words
  • Comparing 3 sentiment libraries
  • Common positive & negative words

Module 7: Topic Modelling

  • Latent Semantic Indexing (LSI)
  • Latent Dirichlet Allocation (LDA)
  • Word topic probabilities
  • Document - topic probabilities
  • Chapters probabilities
  • Per document classification

Module 8: Document Similarity & Classifier

  • Text alignment & pairwise comparison
  • Minihashing and locality sensitive hashing
  • Extract key words 
  • Classify by location, language, topic

Module 9: Working internet and social media (Optional)

  • Extracting data from amazon
  • Extracting data from twitter
  • Extracting youtube comments
  • Extracting facebook comments

Who Should Attend

  • Data Scientists
  • Data Analysts
  • Finance Analysts
  • Marketers


Basic knowledge of R is assumed.


R TrainerDr. Zahra Nazemi has PhD in mathematical statistics from Universiti Putra Malaysia. Her research interests are applied statistics, medical statistics, Bayesian statistics, statistical inference and Software R. She has worked as a lecturer in different universities more than 4 years. She also consulted and worked on assignments for parametric and non-parametric analysis, univariate and multivariate regression analysis in various areas such as medical, economics and psychology. Her other skills are knowledge of research methodology, extensive experience with SPSS, AMOS, R and MINITAB and writing and presenting reports. Moreover, she conducted a special training program in mathematical programming including optimization, advanced multivariate data analysis, and simulation techniques

Data Science TrainerDr. Aanand Verma is a Full Stack Data Scientist who once had a torrid love affair with Physics. He has consulted and published in the area of Public Health, Electricity Markets, Telecom, BFSI, Advertising & Communication Strategies and Digital & Social Media Technologies. He has worked on assignments with international agencies such as International Monetary Fund, World Bank, Royal Netherland Embassy etc. besides MNCs like Tata Consultancy Services, Kie Square Consulting and several government organizations of national importance.

He regularly conducts general training programs in Python (Pandas, NumPy, SciPy, Matplotlib, Bokeh), R (dplyr, rstanarm, knitR, ggplot2), Data Visualization (Tableau, D3.js) and Machine Learning (Reinforced Learning, Scikit Learn) and specialized training programs on Structural Equation Modeling and SAP Hana.

He holds a doctorate in Operations Research from Indian Institute of Management Ahmedabad and a post-graduate in Physics from University of Mumbai. He has advanced training in mathematical programming including optimization, advanced multivariate data analysis, and simulation techniques. When he is not teaching or consulting he can be found meditating or heading for an adventurous trek.

Rattle TrainerTarun Sukhani is an IT executive, educator, author, speaker, data scientist, security expert, agile coach, polyglot coder, and entrepreneur with over 19 years of combined professional experience both in the U.S. and internationally. As a seasoned veteran, my expertise lies in leading teams in the design and delivery of highly scalable, concurrent, and performant enterprise software solutions with budgets of up to $100 million. I am particularly adept at building productive, self-managing agile teams with predictable velocities and delivery timeframes.

Tarun Sukhani is skilled in all phases of the SDLC/ALM, with a solid foundation in Agile (XP, SAFe, Lean, Scrum, Kanban, and Scrumban) and traditional (PMI and PRINCE2) project management frameworks and methodologies.

He is proficient in Big Data/Data Science: Hadoop, Pig, Hive, HBase, Spark, R/Rattle, Cassandra, YARN, Zookeeper, Mahout, SimpleCV, OpenCV

Write Your Own Review

You're reviewing: Text Mining with R

How do you rate this product? *

  1 star 2 stars 3 stars 4 stars 5 stars
1. Do you find the course meet your expectation?
2. Do you find the trainer knowledgeable in this subject?
3. How do you find the training environment
  • Reload captcha


Use spaces to separate Subjects. Use single quotes (') for phrases.

You May Be Interested In These Courses

Natural Language Processing with Python NLTK Training

Natural Language Processing with Python NLTK Training

4 Review(s)
MYR880.00 (GST-exclusive)
Data Mining with Orange

Data Mining with Orange

MYR880.00 (GST-exclusive)