Objective
One of the leading retail company wanted to understand what is prompting users to recommend a certain product, from their reviews for the given product.
Approach
The reviews for the product, the satisfaction score and information on whether they recommend the product to others or not, was scraped and this was the input data that was further pre processed as described below.
Tokenization: Split the text into sentences and the sentences into words. Lowercase the words and removed punctuation. Words that have fewer than 3 characters and all stop words were removed.
Lemmatisation : Words were lemmatized, i.e., words in third person are changed to first person and verbs in past and future tenses are changed into present.
Stemming: Words were stemmed, i.e. words are reduced to their root form.
TF-IdF approach was then used to convert the cleaned text into features
Modeling
A pipeline was used to identify the key features (such as LASSO regression and some Filter methods) first and then applied a series of supervised machine learning models on these shortlisted features and finalized the model on the basics of accuracy scores.
Impact
The derived classification algorithm will be used to predict whether customer will recommend the product or not based on the review .This eliminates the need to capture the recommendation score.