Deep Learning Based Framework For Useful Insights From The News

Pranav Gupta

Pranav Gupta

Pune, Maharashtra

0 0
  • 0 Collaborators

A Deep Learning and NLP based framework for generating useful insights from disease news articles. ...learn more

Project status: Published/In Market

Artificial Intelligence

Groups
Student Developers for AI

Code Samples [1]

Overview / Usage

Created A Deep Learning and NLP based framework for generating useful insights from disease news

articles. It performs various tasks like

● Scrapping news articles from various sources to provide information on HIV-AIDS.

● Using Artificial Neural Network Model for classifying news articles into categories like deaths,

research/development, social injustice, support, negative impact on society and outbreak/surge.

● Creating report on different aspects like identification of at-risk groups for HIV-AIDS and location

showing possible outbreak of HIV-AIDS.

Methodology / Approach

First Phase** **:__- Project starts with the collection of different categories of news articles related to HIV/AIDS from different news publication websites. After collecting the news articles labeling of the articles is done manually. The categories are as follows :-

  1. Educational / Awareness Efforts
  2. Research and Development
  3. Discrimination / Social Stigma
  4. Deaths / Suicides
  5. Outbreaks / Positive Cases

Second Phase__** :- ** The data was then cleaned appropriately using textual data pre-processing techniques. The framework utilizes two NLP techniques. To handle the news articles that were not necessarily about HIV-AIDS but were merely collected by the web scraping algorithm because of the mention of HIV-AIDS were filtered using Term Frequency - Inverse Document Frequency (TF-IDF). Applying TF-IDF to a text document provides a score for each word in the text which indicates the relative importance of the specific word to that particular document. Higher the score, the more important is the word from the documents perspective. Documents that showed a low TF-IDF score for HIV-AIDS were then eliminated. Next step in the pre-processing pipeline was elimination of punctuation. Punctuation marks often don’t provide any unique value in document classification and hence they were removed to improve computational efficiency of the Neural Network models. Next, the words in the text were all converted to lowercase. Stemming is the process of reducing morphological variants of a word to their base or root form. For example, ‘likes’, ‘liked’, ‘likely’, ‘liking’ are all reduced to their root word - ‘like’. Thus the data is pre-processed by the framework and is now ready for further utilization.

_Third Phase__** :- **_The framework begins its process by first training Neural Network Models on the manually labelled and pre-processed data. Using Word Embedding, each document is now represented using a numerical vector, the vector for each document being of the same size. The unstructured text data is thus ready for the Neural Network architectures to be trained upon.

Technologies Used

  • Python
  • NLTK
  • Keras
  • Matplotlib (for visualization)
  • Spacy (for extracting places names from the articles)

Repository

https://github.com/pranavmicro7/Web-Scraper-And-Classifier-For-HIV-Articles-

Comments (0)