Need: In any research/ model, identifying the key interactions (combination of different variable &levels) which will explain the more variance of the dependent variable is really important. But identifying these interactions is really tedious task due to the number of variables and their levels and also when it comes to interpretation or deploy the model in real time these interaction effects give clear action plans for the decision makers compared to main effects.

 

Approach: Random Forest method.

               Random forest is an ensemble method, where it will fir the multiple decision tress parallel.

 

 

Decision trees also powerful greedy   sequential data splitting algorithm for both classification and regression. On its process in the DT, we will end up the terminal node, which is the last node in the DT which has the combination of various variables ad level (can referred as any interaction) which can explain better variance in the dependent variable.  But the only problem with single DT is it is the complete tree is all dependent on the first variable

In order to remove this, we will use the RF concept where can randomise the variables and build the multiple DT’s and take all the terminal nodes from these DT and check for their interactions which are explain the variance in the dependent variables.

Now consider these interactions and build the model, through which we can find the joint effect of multiple variables and levels on the dependent variables, which will provide the richer interpretation as well as action plan.

Case study:

Let’s take the publicly available diabetics data set, in which   we have diabetic status along with No, Pregnancies, Glucose, Blood Pressure, Skin Thickness, Insulin, BMI, Diabetes, Pedigree Function and age.

Now we want to find which are the different interaction effects emerges from this data from 36o degrees view which can explain either diabetic status yes or No.

Applied the Random Forest by keeping data fixed and randomised the variables and build the multiple tress and got the below interactions for Yes /no status.

Possible Interaction identified  for explaining Yes

Possible Interaction for explaining No

 

 

 

 

 

Now let’s focus on this interaction and build the model with these only.

Print Friendly, PDF & Email
Identifying the synergetic interactions for better modelling

Venugopal Manneni


A doctor in statistics from Osmania University. I have been working in the fields of Analytics and research for the last 15 years. My expertise is to architecting the solutions for the data driven problems using statistical methods, Machine Learning and deep learning algorithms for both structured and unstructured data. In these fields I’ve also published papers. I love to play cricket and badminton.


Post navigation