Evaluation of News Articles

Introduction and Motive:

India Data Portal (IDP) is an open source data repository that has data related to agriculture, General economy, rural development and financial inclusion domain.
Purpose of this is to promote data-driven journalism so that journalists can use this data to make their news stories evidence (data) based and more interesting.
We have to create a dashboard where we can display all information by scraping news articles from various sources in six languages (English, Hindi, Bengali, Marathi, Oriya, Telugu). We have to display how many articles are data related, related to IDP, and contain graphics related
to IDP.
Topic modelling is also to be performed to find out subtopics that can further divide the four domains and to find most common keywords along with extraction of authors. We can use this information to tag authors along with their extracted twitter handles.


Technologies Used

Beautifulsoup, selenium and newspaper3k libraries in python
for web scraping.
  • K-means clustering and Topic modelling
  • Dash plotly
  • Aws S3 bucket
  • Aws
  • Azure