This article will explain how to connect to SQL server, issue a Query and create a DataFrame using Pandas, My example queries a SQL server Database with the purpose of retreving the number of bookings that a small BnB made over a period of 3 years.
Connect to SQL code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
import pyodbc import pandas as pd sql_conn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server}; SERVER=SERVERNAME; DATABASE=DBNAME; UID=USERNAME;PWD=PASSWORD') query = '''with cte as ( select bo_from, bo_to, datename(month, BO_RESDATE) as ReservationMonth, year(bo_resdate) as ReservationYear from bookings where right(bo_cdrn, 4) = 0202 and bo_year in (2018, 2019) and bo_status = 'B' ) select ReservationMonth, ReservationYear, count(*) as NumberOfBookings from cte group by ReservationMonth, ReservationYear''' df = pd.read_sql(query, sql_conn) df.head() |
This returned the following sample.
1 2 3 4 5 6 |
ReservationMonth ReservationYear NumberOfBookings 0 August 2017 2 1 July 2017 1 2 June 2017 9 3 November 2017 2 4 October 2017 3 |
next bit is to make the data look more presentable.
1 |
all_bookings = df3.set_index(['ReservationMonth', 'ReservationYear']).sort_index() |
issuing all_bookings results in the following sampleoutput.
1 2 3 4 5 6 7 8 9 |
NumberOfBookings ReservationMonth ReservationYear August 2017 2 2018 3 December 2018 1 February 2018 2 2019 1 January 2018 2 2019 4 |
you can see from the above output that in August 2017 we had 2 bookings but in August 2018 we had 3.
We can also pivot the results a lot easier using Python than using SQL.
1 |
pivotdf = df.pivot(index='ReservationYear', columns='ReservationMonth', values='NumberOfBookings') |
results in the sample output, where NaN means no bookings.
1 2 3 4 5 |
ReservationMonth August December February ... November October September ReservationYear ... 2017 2.0 NaN NaN ... 2.0 3.0 1.0 2018 3.0 1.0 2.0 ... 2.0 3.0 5.0 2019 NaN NaN 1.0 ... NaN NaN NaN |