Using Pandas to get stock data

In this blog post we want to explore how to download financial data from Yahoo finance with Python. The easiest way is to use the data analysis package Pandas for Python. Pandas is a high quality package that offers a Yahoo finance data reader as well as a set of useful data structures to perform analysis on this data. Check out Pandas at http://pandas.pydata.org. Installing NumPy and Pandas is very easy. After downloading the appropriate wheel (*.whl) file for your operating system you can use pip to install the package.

station:~ user$ pip install <package>.whl

You will need administrator privileges for the installation. After the installer finishes without throwing an error, you can use the packages now in any python project.
Let’s start using Pandas to get stock data. We create a new file stockdata.py and start by importing the necessary packages.

import pandas
import pandas.io.data as web
from datetime import datetime

Next we have to define the ticker symbols of the stocks we want to retrieve as well as the period for which we want stock data. Tickers can be retrieved as bulk when passed as vectors. The period can be defined by a start and end date in the datetime format. The datetime format takes (year, month, day) in this order as parameter. In our example we want to retrieve stock prices for Oracle (ORCL), Tesla (TSLA), IBM (IBM) and Microsoft (MSFT) for the year 2014.

tickers = ['ORCL', 'TSLA', 'IBM', 'MSFT']
start = datetime(2014,1,1)
end = datetime(2014,12,31)
stockRawData = web.DataReader(tickers, 'yahoo', start, end)

The last line calls Pandas DataReader that retrieves the defined tickers from start to end from Yahoo Finance and returns a Pandas panel object. The panel object is a 3D data cube (array) with the dimensions: time, ticker and field (Open [Price], High, Low, Close, Volume and Adjusted Close). In a next step we want to print the retrieved data to our terminal. A panel object is not directly printable but can flattened to a printable 2D data frame object.

print stockRawData.to_frame()

Alternatively we can use the slice operation to pick one field to display. Let’s say we are just interested in the adjusted close price. We can then get a 2D data frame with all ticker, dates but only the adjusted close price as values with the following command:

sliceKey = 'Adj Close'
adjCloseData = stockRawData.ix[sliceKey]
print adjCloseData

The variable adjCloseData is a data frame object that can be used like a 2D array. If we want, for example, to get the adjusted close prices only for IBM in this period we can access the data frame like following:

ibmAdjCloseData = adjCloseData['IBM']
print ibmAdjCloseData


Get the full code:

import pandas
import pandas.io.data as web
from datetime import datetime

tickers = ['ORCL', 'TSLA', 'IBM', 'MSFT']
start = datetime(2014,1,1)
end = datetime(2014,12,31)
stockRawData = web.DataReader(tickers, 'yahoo', start, end)
print stockRawData.to_frame()

sliceKey = 'Adj Close'
adjCloseData = stockRawData.ix[sliceKey]
print adjCloseData

ibmAdjCloseData = adjCloseData['IBM']
print ibmAdjCloseData

Tags: , ,

Leave a comment