Need: Feature selection, the process of finding and selecting the most useful features in a dataset, is a crucial step of the machine learning pipeline. Unnecessary features decrease training speed, decrease model interpretability, and, most importantly, decrease generalization performance on the test set

Approach: There are three main types of feature selection techniques: supervised and unsupervised (Dimensionality reduction based), variable importance-based methods

 

Filter based:  Methods allow the selection or the deselection of entire features (columns, essentially) based on the passing score of some metrics. For example, if the variance (meaning that the values are too dispersed) of a column is too high, we have good reasons to avoid putting that column in the model, as it will likely diminish its accuracy. The same applies to different metrics.

  • T test/Anova
  • Fisher’s Score
  • Correlation Coefficient
  • Chi-Square Test
  • Information Gain..etc

Wrapper-based: Wrapper methods group several techniques that use an R Squared value to measure whether a feature should be conserved or not. These techniques work by iteratively using features while monitoring a change in score, hence they work recursively:

  • Exhaustive Feature Selection
  • Forward Regression
  • Backward Regression/RFE_Recursive Feature Elimination
  • Stepwise Regression
  • Bi-directional elimination

EmbeddedEmbedded methods are ways of performing feature selection while training the model. This is common in neural networks, where, automatically, features are selected or deselected with Normalization techniques:

  • L1 Normalization
  • L2 Normalization

Feature importance methods: Feature importance refers to techniques that assign a score to input features based on how useful they are at predicting a target variable, Linear Regression Feature Importance,

  • Regression Feature Importance
  • Logistic Regression Feature Importance
  • Decision Tree Feature Importance
  • Random Forest Feature Importance. Etc

Dimensionality Reduction(Unsupervised Methods)

One can describe Principal Components Regression as an approach for deriving a low-dimensional set of features from a large set of variables. The idea is that the principal components capture the most variance in the data using linear combinations of the data in subsequently orthogonal directions .In this way, we can also combine the effects of correlated variables to get more information out of the available data

 

For code 

https://github.com/drstatsvenu/feature-selection

 

Print Friendly, PDF & Email
Feature Selection

Venugopal Manneni


A doctor in statistics from Osmania University. I have been working in the fields of Analytics and research for the last 15 years. My expertise is to architecting the solutions for the data driven problems using statistical methods, Machine Learning and deep learning algorithms for both structured and unstructured data. In these fields I’ve also published papers. I love to play cricket and badminton.


Post navigation