Need: In any research/ model, identifying the key interactions (combination of different variable &levels) which will explain the more variance of the dependent variable is really important. But identifying these interactions is really tedious task due to the number of variables and their levels and also when it comes to interpretation or deploy the model in real time these interaction effects give clear action plans for the decision makers compared to main effects.
Approach: Random Forest method.
Random forest is an ensemble method, where it will fir the multiple decision tress parallel.
Decision trees also powerful greedy sequential data splitting algorithm for both classification and regression. On its process in the DT, we will end up the terminal node, which is the last node in the DT which has the combination of various variables ad level (can referred as any interaction) which can explain better variance in the dependent variable. But the only problem with single DT is it is the complete tree is all dependent on the first variable
In order to remove this, we will use the RF concept where can randomise the variables and build the multiple DT’s and take all the terminal nodes from these DT and check for their interactions which are explain the variance in the dependent variables.
Now consider these interactions and build the model, through which we can find the joint effect of multiple variables and levels on the dependent variables, which will provide the richer interpretation as well as action plan.
Case study:
Let’s take the publicly available diabetics data set, in which we have diabetic status along with No, Pregnancies, Glucose, Blood Pressure, Skin Thickness, Insulin, BMI, Diabetes, Pedigree Function and age.
Now we want to find which are the different interaction effects emerges from this data from 36o degrees view which can explain either diabetic status yes or No.
Applied the Random Forest by keeping data fixed and randomised the variables and build the multiple tress and got the below interactions for Yes /no status.
Possible Interaction identified for explaining Yes
Possible Interaction for explaining No
Now let’s focus on this interaction and build the model with these only.