Need In any research, before it starts, we need to review the existing scientific literature as well as available past research around that topic (i.e., existing publications, articles around that topic). But it is really tedious task where we need to search these articles manually one by one which also consumes a lot of time. Hence there is a need to extract the relevant information to the given text/word/topic in one go.

Solution: To the advancements of the NLP and other machine learning algorithms, we can quickly search extract the relevant articles from the web (based on the given keywords) )and then using NLP techniques we can process the same and create a dashboard through which we can quickly go through that paper and review the information.

Even when we build the dashboard we can also differentiate the articles by a number of words, keywords, length, and even with some custom topics

Case study:

Objective:  Wanted to search Covid and diabetic-related papers from PubMed and need to access them quickly

Steps:   

  • First, you need to scrap all the papers from PubMed using Keyword “Covid+diabetic”
  • Second, pre-process or clean the text using NLP techniques
  • Prepare the data with required columns (Like # of chars, # of words, any custom topics. etc)
  • Create the dashboard with pre-processed text
  • Just click the circle and it will navigate you to the article link on the web

 

 

How to use:   In the dashboard the circles are the links and the circle size reflects on the basis of # of characters and   # of words.  Once you click on this circle you’ll navigate to the actual article link on the web.

 

 

 

 

 

 

Print Friendly, PDF & Email
Data extraction for quick literature review

Venugopal Manneni


A doctor in statistics from Osmania University. I have been working in the fields of Analytics and research for the last 15 years. My expertise is to architecting the solutions for the data driven problems using statistical methods, Machine Learning and deep learning algorithms for both structured and unstructured data. In these fields I’ve also published papers. I love to play cricket and badminton.


Post navigation