To get the news about a stock for a particular company, just use the search bar to search for it .
For example, the following figure shows the news headlines section for Amazon:
Instead of going through each headline for every stock that you're interested in, you can use Python to parse the website data and perform sentiment analysis (by assigning a sentiment score) for every headline and then average it over a period of time.
So, let's see how you can do this in Python:
We will first import the libraries that we will use to store the data.
We will use the BeautifulSoup library to parse data from the website and the requests library to get the data.
The Pandas library will be used to store the data in its DataFrame objects while the Matplotlib library will be used to visualize the data.
After getting the data, we will use the nltk.sentiment.vader library to perform sentiment analysis on the news headlines.
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup
import pandas as pd
from urllib.request import urlopen, Request
from nltk.sentiment.vader import SentimentIntensityAnalyzer
To parse the website, you should add the stock ticker to the end of the following URL:
First, we created a Python dictionary named news_tables.
We then specified the tickers for the three stocks we are interested in.
The for loop will make 3 iterations, extracting news data for one of the stocks per iteration.
Print the Data
We now have a dictionary named news_data with data.
If you execute the code and type the name of the dictionary on the Python terminal, you'll see it has a huge txt:
Let's display the contents of the news_tables for AMZN.
The code will iterate through the <tr></tr> tags to extract the date and time from <td></td> tags and the headlines from the <a></a> tags.
amazon = news_tables['AMZN']
amazon_tr = amazon.findAll('tr')
for x, table_row in enumerate(amazon_tr):
a_text = table_row.a.text
td_text = table_row.td.text
if x == 3:
The code should return something like this:
Notice the use of an if condition that helps us extract only the first 4 rows from the data.
Parse the Data into a Python List
We now want to write some code to parse the date, the time, and the headlines into a Python List called news_list.
A closer look at the news headlines reveals that only the first news of each day has the date label.
To account for this, we will use an if...else loop.
news_list = 
for file_name, news_table in news_tables.items():
for i in news_table.findAll('tr'):
text = i.a.get_text()
date_scrape = i.td.text.split()
if len(date_scrape) == 1:
time = date_scrape
date = date_scrape
time = date_scrape
tick = file_name.split('_')
news_list.append([tick, date, time, text])
The first for loop will iterate over the news, while the second for loop will iterate over all <tr> tags in the news_table.
The .get_text() function will extract the text placed within the <tr> tag, but it will only get the text placed within the <a> tag.
The .split() function will split the text placed in <td> tag into a list.
If the length of the split data is 1, time will be loaded as the only element.
Otherwise, date will be loaded as the first element and time as the second.
You can execute the code and type news_list on the Python terminal.
The result will be a list of lists with part of it looking as follows:
We can now use nltk.sentiment.vader to perform sentiment analysis.
We will first store the ticker, the date, the time, and the headlines in a DataFrame.
Next, we will perform sentiment analysis on the headlines and then add an additional column to our DataFrame to store the sentiment scores per headline: