Text classification – Recommending Product based on their reviews

Objective

One of the leading retail company wanted to understand what is prompting users to recommend a certain product, from their reviews for the given product.

Approach

The reviews for the product, the satisfaction score and information on whether they recommend the product to others or not, was scraped and this was the input data that was further pre processed as described below.

Tokenization: Split the text into sentences and the sentences into words. Lowercase the words and removed punctuation. Words that have fewer than 3 characters and all stop words were removed.

Lemmatisation : Words were lemmatized, i.e., words in third person are changed to first person and verbs in past and future tenses are changed into present.

Stemming: Words were stemmed, i.e. words are reduced to their root form.

TF-IdF approach was then used to convert the cleaned text into features

Modeling

A pipeline was used to identify the key features (such as LASSO regression and some Filter methods) first and then applied a series of supervised machine learning models on these shortlisted features and finalized the model on the basics of accuracy scores.

Impact

The derived classification algorithm will be used to predict whether customer will recommend the product or not based on the review .This eliminates the need to capture the recommendation score.

Text classification – Recommending Product based on their reviews

Venugopal Manneni

A doctor in statistics from Osmania University. I have been working in the fields of Analytics and research for the last 15 years. My expertise is to architecting the solutions for the data driven problems using statistical methods, Machine Learning and deep learning algorithms for both structured and unstructured data. In these fields I’ve also published papers. I love to play cricket and badminton.

Venugopal Manneni

Post navigation