Need  : As outliers impact the statistical analysis , once they detected using univariate or multivariate outlier detection methods , we need to treat them properly  before using them in the  Analysis.

Approach:   The following are the some of the uni variate outliers treatment approaches

Deleting observations: if we have luxury to delete the observations, this is the ideal method to remove the outlier cases from the data. This approach very much suitable for large sample sizes.

Capping the Observations: This method is suitable when we have identified outliers from the Box plot and we will use capping method to treat the outliers like if any value fall out of the maximum value in the box plot , those values will brought back to 95th percentile and any value which fall below minimum in the box plot those values will force to 5th percentile value

 

Transforming variables: In case of the outliers are from the skew data, we can use the log, square and Quantile  transformations and make distribution as normal distribution

Binning: Binning is also a form of variable transformation. Binning is a way to group a number of more or less continuous values into a smaller number of “bins”(groups), which essentially makes the continuous as a factor variables and reduces  variance due to the extreme values

 

Imputing:  Like imputation of missing values, we can also impute outliers. We can use mean, median, mode imputation methods. Before imputing values, we should analyze if it is natural outlier or artificial. If it is artificial, we can go with imputing values. We can also use statistical model to predict values of outlier observation and after that we can impute it with predicted Values

 

Code link —  https://github.com/drstatsvenu/Outliers-detection

 

 

 

Print Friendly, PDF & Email
How to handle the outliers

Venugopal Manneni


A doctor in statistics from Osmania University. I have been working in the fields of Analytics and research for the last 15 years. My expertise is to architecting the solutions for the data driven problems using statistical methods, Machine Learning and deep learning algorithms for both structured and unstructured data. In these fields I’ve also published papers. I love to play cricket and badminton.


Post navigation