HOW TO INTEGRATE PYTHON WITH AZURE SQL DATABASE FOR ETL PROCESSES

Recently, I was working on a data project where I needed to connect to an Azure SQL database using Python script and extract data from Azure SQL DB to a local repository. To do that, you can follow the below steps:
1. Create an SQL Server and Deploy an Azure SQL DB:
To create an SQL database on Azure, you need to first create an SQL Server which the SQL database will be dependent on for connection.
To do this using the Azure portal, navigate to create a new SQL Database and provide the requested information. You can create the resource group and server ahead of time or while creating the database itself.
For local development connections, make sure your logical server is configured to allow your local machine IP address and other Azure services to connect. You can do this,
- Navigate to the Networking page of your server.
- Toggle the Selected Networks radio button to show additional configuration options.
- Select Add your client IPv4 address(xx.xx.xx.xx) to add a firewall rule that will enable connections from your local machine IPv4 address.
- Alternatively, you can also select + Add a firewall rule to enter a specific IP address of your choice.
- Make sure the Allow Azure services and resources to access this server checkbox are selected.
You can now create an SQL database and select your server above as part of the parameters.
From the config page above, you can connect a sample database in the additional settings tab. In the Data source section, for Use existing data, select Sample. This creates an AdventureWorksLT sample database so there are some tables and data to query and experiment with. See below:
Once your database is created, you can use the Query editor (preview) in the Azure portal to connect to the database and query data. Alternatively, you can use the SSMS or Azure Data Studio to query the database
Reference — https://learn.microsoft.com/en-us/azure/azure-sql/database/free-offer?view=azuresql
2. Install Microsoft ODBC Driver for Windows/Linux:
The Microsoft ODBC (Open Database Connectivity) Driver is a database driver that enables applications to connect to and interact with databases using the ODBC interface. ODBC is a standardized API (Application Programming Interface) that allows applications to communicate with various database management systems.
To install the Microsoft ODBC Driver on Windows or Linux, you can follow these general steps:
Installing Microsoft ODBC Driver on Windows:
1. Download the ODBC Driver:
- Visit the [Microsoft ODBC Driver for SQL Server](https://docs.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server) download page.
- Select the appropriate version of the driver for your system (32-bit or 64-bit).
2. Install the Driver:
- Run the downloaded installer and follow the on-screen instructions.
- During the installation process, you may be prompted to provide information such as the SQL Server you want to connect to, authentication details, etc.
3. Verify Installation:
- After installation, you can verify that the driver is installed by checking the ODBC Data Source Administrator. You can access this through the Control Panel or by searching for “ODBC” in the Start menu.
Installing Microsoft ODBC Driver on Linux:
1. Download the ODBC Driver:
- Visit the [Microsoft ODBC Driver for SQL Server on Linux](https://docs.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server#linux) documentation page.
- Follow the instructions to add the Microsoft ODBC Driver for the SQL Server repository and install the driver.
2. Install the Driver:
- Use the package manager for your Linux distribution to install the driver. For example, on Ubuntu, you might use the `apt` package manager:
sudo su
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/20.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
exit
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql17
3. Verify Installation:
- After installation, you can verify that the driver is installed by checking the installed ODBC drivers:
odbcinst -q -d -n "ODBC Driver 17 for SQL Server"
Reference — https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver16
3. Create a Python Virtual Environment Using the VS Code Terminal and Install Python Dependencies:
1. Open VS Code:
Open Visual Studio Code on your computer.
2. Open a Terminal:
In VS Code, open a new terminal by clicking on “View” in the top menu, then selecting “Terminal.”
3. Navigate to Your Project Directory:
Change the working directory to your project folder. You can use the `cd` command to navigate to your project directory. For example:
cd path/to/your/project
4. Create a Virtual Environment:
Use the following command to create a Python virtual environment. Replace `venv` with the name you want for your virtual environment:
python -m venv venv
If you are using Python 3.x, you might need to use `python3` instead of `python`.
5. Activate the Virtual Environment:
On Windows:
.\venv\Scripts\activate
On macOS/Linux:
source venv/bin/activate
6. Install Dependencies:
There are dependencies listed in a `requirements.txt` file like pyodbc, pandas, etc., you can install them using:
pip install -r requirements.txt
7. Verify the Virtual Environment:
To verify that you are using the correct Python interpreter and that your virtual environment is active, you can use:
which python
This should point to the `venv` directory.
Reference:
https://code.visualstudio.com/docs/python/environments
4. Create a Python Connection String to Azure SQL Database Using VS Code with Python Extension:
To connect to an Azure SQL Database using Python in VS Code, you will need to use the `pyodbc` library along with the appropriate connection string. Follow these steps:
1. Import `pyodbc` in Your Python Script:
Create a Python script in VS Code or open an existing one. At the top of your script, import the `pyodbc` module:
import pyodbc
2. Create the Azure SQL Database Connection String:
Replace the placeholder values in the connection string with your Azure SQL Database information. The connection string follows this format:
# Configure SQL Server Connections Parameters
server = 'azure sql server name'
database = 'azure sql database name'
username = 'sql server username'
password = 'sql server password'
driver= '{ODBC Driver 18 for SQL Server}'
connection_string = (
f'DRIVER={driver};'
f'SERVER={server};'
f'DATABASE={database};'
f'UID={username};'
f'PWD={password};'
)
Ensure you replace the placeholders (`your_server`, `your_database`, `your_username`, and `your_password`) with your Azure SQL Database server details.
3. Establish a Connection:
Use the `pyodbc.connect` function to establish a connection to the Azure SQL Database:
# Connect to the SQL Server and test the connection
cnxn = pyodbc.connect(connection_string)
5. Execute SQL Query Using the Pyodbc Cursor Function:
Create a Cursor:
Once the connection is established, create a cursor to execute SQL queries:
cursor = cnxn.cursor()
Execute SQL Queries:
Now, you can execute SQL queries using the cursor:
# use the cursor function to query the top 5 records from the DB
cursor.execute("SELECT TOP 5 pc.Name as CategoryName, p.name as ProductName FROM [SalesLT].[ProductCategory] pc JOIN [SalesLT].[Product] p ON pc.productcategoryid = p.productcategoryid")
row = cursor.fetchone()
while row:
print (str(row[0]) + " : " + str(row[1]))
row = cursor.fetchone()
A sample output should look like this:
(.venv) PS C:\Users\DELL\Documents\Python_Azure_SQL> & c:/Users/DELL/Documents/Python_Azure_SQL/.venv/Scripts/python.exe c:/Users/DELL/Documents/Python_Azure_SQL/app.py
Road Frames : HL Road Frame - Black, 58
Road Frames : HL Road Frame - Red, 58
Helmets : Sport-100 Helmet, Red
Helmets : Sport-100 Helmet, Black
Socks : Mountain Bike Socks, M
6. Export the Query’s Output in a Directory as a CSV file Using the Pandas Library:
To convert the query result into a data frame and export it to a CSV file, you will need to use the `pandas` library along with the appropriate connection string. Follow these steps:
1. At the top of your script, import the `pandas` module:
import pandas as pd
2. Convert the query result into a data frame and export to a CSV file:
You can now use the existing connection string to query the database using the pandas function. See below:
# Query and export the query result from a Dataframe to a CSV file
queryOutput = pd.read_sql_query("SELECT * FROM [SalesLT].[Product]", cnxn)
df = pd.DataFrame(queryOutput)
df.to_csv (r'exported_sql_data3.csv', index = False)
Reference — https://datatofish.com/export-sql-table-to-csv-python/
7. Close Connection:
Using the close() function of the pyodbc, you can terminate the connection after executing the script and the records exported.
# Close the connection
cnxn.close()
Putting it all together, here’s a complete script:
# Import libraries
import pyodbc
import pandas as pd
# Configure SQL Server Connections Parameters
server = 'azure sql server name'
database = 'azure sql database name'
username = 'sql server username'
password = 'sql server password'
driver= '{ODBC Driver 18 for SQL Server}'
connection_string = (
f'DRIVER={driver};'
f'SERVER={server};'
f'DATABASE={database};'
f'UID={username};'
f'PWD={password};'
)
# Establish connection to the dtabase
cnxn = pyodbc.connect(connection_string)
# use the cursor function to query top 5 records from the DB
cursor = cnxn.cursor()
cursor.execute("SELECT TOP 5 pc.Name as CategoryName, p.name as ProductName FROM [SalesLT].[ProductCategory] pc JOIN [SalesLT].[Product] p ON pc.productcategoryid = p.productcategoryid")
row = cursor.fetchone()
while row:
print (str(row[0]) + " : " + str(row[1]))
row = cursor.fetchone()
# Query and export the query result from a dataframe to a CSV file
queryOutput = pd.read_sql_query("SELECT * FROM [SalesLT].[Product]", cnxn)
df = pd.DataFrame(queryOutput)
df.to_csv (r'exported_sql_data3.csv', index = False)
# Close the connection
cnxn.close()
You can clone the source code on my Github
Did I face some casual challenges? The answer is YES 😂
Some of the challenges you may face include:
✔ SQL Firewall Issue: At first, I was having firewall issues on the SQL Server even when I allowed external services to access the SQL Server. To fix this, I had to allow any public IP on this range: 0.0.0.0 to 0.0.0.254 to access the server.
✔ ODBC Driver Issue: I had to install an updated version of ODBC Driver Driver 18 for SQL Server on my PC when the Python script showed a TCP/ODBC Driver error on the terminal.
I hope this helps anyone interested in exploring this option.