Predicting the student Grade using PyCaret Auto ML Pipeline

PyCaret ( (https://pycaret.org/) is an open-source machine learning library that automates the entire process of training a machine learning model. When using it, you just need to have an idea of the best features you need to train your machine learning model, then you can use PyCaret from model selection to training and testing. Simply put, it automates the entire machine learning process, from choosing which model to select to training and testing your model.

The best feature of PyCaret is that it helps you know which is the best machine learning model that you should use on a particular dataset. It simply shows you the best performing models by ranking the models based on the performance measurement metrics of machine learning models. The best part about this feature is that it does everything with a few lines of code.

first we need to import the required libraries

import pandas as pd
import numpy as np
from pycaret.classification import *

Read  data file (This data has the 30 independent variables and one dependent variables
 which is Grade in this case)

student_df = pd.read_csv('student.csv', sep =',')
student_df_df.head(10)



Removing the slno variable

school_df = heart_disease_df.drop(['Slno'], axis=1)

Splinting the data for training and validation sets

hd_data = school_df.sample(frac=0.7, random_state=786).reset_index(drop=True)
hd_data_unseen = school_df.drop(hd_data.index).reset_index(drop=True)
print(‘Data for Modeling: ‘ + str(hd_data.shape))
print(‘Unseen Data For Predictions: ‘ + str(hd_data_unseen.shape))

Data for Modeling: (354, 31)
Unseen Data For Predictions: (152, 31)

Now let’s set up the model. As this is the problem of classification 
so I will set up the model for classification. 
While setting up this model we need to declare the data and the target labels. 
We also need to declare the features that need to be ignored while training the model. 

Below is how to set up the PyCaret model for classification:

school_clf = setup(data = hd_data, target = ‘Grade’,
log_experiment = True, session_id=111)

Now I am going to use the most important feature provided by this library which compares models. In machine learning, this is called model selection. If you don’t understand much about model selection, you can use this feature for model selection. Here’s how to compare machine learning models using PyCaret:

school_clf_models =compare_models(n_select = 10)

According to the compare models Gradient Boosting classier performing very well on this data

Now we can even find he feature importance

plot_model(school_clf_models[8], plot=’feature’)

Even we can get the global as well as local level interpratabilirty using Shap

and also performance matrix like Confusion Matrix, Roc Curve –etc

and we save the Model and apply this on the new data

save_model(final_GradientBoostingClassifier,’Final GBC Model 04April2020′)

new_prediction = predict_model(final_GradientBoostingClassifier, data=hd_data_unseen)

new_prediction.head()

Summary

PyCaret is a great machine learning library to automate the complete process of training a machine learning model as it helps in from model selection to training and testing. You can use it for at least model selection if you don’t like shortcuts while training machine learning models.

Predicting the student Grade using PyCaret Auto ML Pipeline

Venugopal Manneni

A doctor in statistics from Osmania University. I have been working in the fields of Analytics and research for the last 15 years. My expertise is to architecting the solutions for the data driven problems using statistical methods, Machine Learning and deep learning algorithms for both structured and unstructured data. In these fields I’ve also published papers. I love to play cricket and badminton.

Summary

Venugopal Manneni

Post navigation