As a professional in the field of data management and analysis, mastering the skill of retrieving data from a database in Python is essential. In today’s data-driven world, accessing and extracting information from databases efficiently using Python can significantly enhance productivity and decision-making processes.
In this article, I will guide you on how to fetch data from a database in Python, empowering you with the knowledge and skills necessary to harness the full potential of Python for data manipulation tasks. Whether you’re a seasoned data analyst or a beginner in the field, understanding how to interact with a database in Python opens a world of opportunities for data exploration, analysis, and reporting.
By the end of this tutorial, you’ll have a solid understanding of the fundamentals of database interaction in Python, enabling you to extract, manipulate, and analyze data with ease. Let’s dive in and explore the power of Python in database management and data extraction.
Data Extraction with Python Database
Data extraction entails retrieving data from various sources, and sometimes processing it further, and migrating it to repositories for further analysis. So, some kind of data transformation happens in the process. And python is one of the leading programming languages for such data science tasks. There are about 8.2 million users of this general-purpose and scripting language across the world.
In the following guide, we will discuss extraction methods using PostgreSQL, an open-source relational database system. It provides a ROW_TO_JSON function that returns the result sets as JSON objects, which are surrounded by curly braces {}. JSON data types would help you manipulate query results more conveniently. But before we begin, make sure that you have installed a virtual environment, such as psycopg2-binary.
Our learners also read: Top Python Courses for Free
Python Database Basics
Suppose you have a PostgreSQL database of the American National Football League (NFL). This would include information about the players, coaches, and teams’ tables. Also, note the following details to get clued up about the stored data:
- Players’ data table houses details like athelete_id, which is the primary key, players’ first and last names, jersey numbers, weight (in kg), height (in m), and their country of origin. It also holds the team_id, a foreign key indicating each athletes’ team.
- The data table on coaches has coach_id (primary key), along with the first and last names, and team_id (a foreign key referencing the teams’ table field).
- Finally, there is the teams’ table that describes every football team with a name, conference, their rank, and total wins and losses (bifurcated into ‘home’ and ‘away’). Here, the primary key is team_id, which is referenced in the tables above.
Now that you are familiar with the dataset, let us explore how to write an SQL query to retrieve a list of teams. For example, you need football teams ordered according to their conference and rank. You also want to extract the number of athletes or players in each team along with the names of their coaches. You may also want to know the number of the teams’ wins and losses, both at home and away.
Follow the steps below to start this process:
SELECT
f.name,
f.city,
f.conference,
f.conference_rank,
COUNT(a.player_id) AS number_of_athletes,
CONCAT(c.first_name, ‘ ‘, c.last_name) AS coach,
f.home_wins,
f.away_wins
FROM athletes a, teams f, coaches c
WHERE a.team_id = f.team_id
AND c.team_id = f.team_id
GROUP BY f.name, c.first_name, c.last_name, f.city, f.conference, f.conference_rank, f.home_wins, f.away_wins
ORDER BY f.conference, f.conference_rank
After this, you can warp the query inside the JSON function we mentioned earlier (ROW_TO_JSON). This will save the data to a file called query.sql in your current directory. Now, continue with the steps given below.
Read our popular Data Science Articles
SELECT ROW_TO_JSON(team_info) FROM (
SELECT
f.name,
f.city,
f.conference,
f.conference_rank,
COUNT(a.athelete_id)AS number_of_atheletes,
CONCAT(c.first_name, ‘ ‘, c.last_name) AS coach,
f.home_wins,
f.away_wins
FROM athletes a, teams f, coaches c
WHERE a.team_id = f.team_id
AND c.team_id = f.team_id
GROUP BY f.name, c.first_name, c.last_name, f.city, f.conference, f.conference_rank, f.home_wins, f.away_wins
ORDER BY f.conference, f.conference_rank
) AS team_info
You would observe that each row has the structure of a python dictionary. The keys are just the field names returned by your query.
Moreover, to avoid exposing your environment variables in plain sight, you can apply some changes to your initialization files. Choose any of the following methods, depending on your needs:
- For Windows: Control panel → System → Advanced System Settings → Advanced Tab → Environment variables.
- For a Unix-like environment: Append two lines about your username and password to your initialization file.
With this, you are all set to write python code. At the very outset, we will import some modules and functions to prevent errors. These statements can help you accomplish that:
import os
import psycopg2 as p
from psycopg2 import Error
Then, we will instantiate the connection by loading the contents of query.sql. Open the SQL database file using open and read commands, and connect with the NFL database using the connect function by specifying your database user, password, host, and port number.
Also Read: Python Projects on GitHub
upGrad’s Exclusive Data Science Webinar for you –
Transformation & Opportunities in Analytics & Insights
Explore our Popular Data Science Courses
How to Fetch Data From a Database in Python?
Once you have established the database connection, you can proceed with query execution. You need to use a control structure called ‘cursor’. It is as easy as writing “cursor = conn.cursor()” and subsequently, “cursor.execute(query)”. The result would then contain a list of tuples (one-element) in a dictionary format.
result = cursor.fetchall()
At this stage, you can attempt iterating over the result. You can manipulate the contents as you want, insert or feed them into spreadsheets, HTML tables, etc. Don’t forget to wrap and clean your code while you finish. You can do so with a try-except-block and adding a ‘finally’ sentence.
When you are handling large datasets, relational or otherwise, you feel the need for some basic tools to query the tables, especially when you also want to manipulate the results. Such data transformation is easy to achieve with python.
Therefore, most postgraduate programs of study include the knowledge of these techniques as a part of the curriculum. Some examples include the Associate Diploma in Data Science (IIIT-Bangalore) and Global Master Certificate in Business Analytics (Michigan State University).
Checkout: Python Open Source Project Ideas
Top Data Science Skills to Learn
Top Data Science Skills to Learn
1
Data Analysis Course
Inferential Statistics Courses
2
Hypothesis Testing Programs
Logistic Regression Courses
3
Linear Regression Courses
Linear Algebra for Analysis
Conclusion
Mastering data extraction with Python database basics has been an enlightening journey for professional in data management and analysis. Learning how to fetch data from a database in Python has empowered to streamline workflow, boost efficiency, and make more informed decisions based on data-driven insights.
Exploring the fundamentals of Python database basics and following the step-by-step guide on data extraction has provided with a solid foundation for leveraging Python’s capabilities in handling data manipulation tasks. With practice and application, I am confident that you can enhance your data-handling abilities and contribute more effectively to the organization’s success.
By implementing the knowledge gained from this tutorial, I believe you will get to know how to fetch data from a database in Python also taking data analysis skills to the next level and stay ahead in today’s competitive business environment.
Learn data science courses from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.
How do you pull data from an API using Python requests?
When you wish to receive data from an API, you must make a request from the server, just like when you interact with conventional websites. We'll need to use the requests package to get data from an API using Python. In Python, Requests is the standard library for making HTTP requests. Because of its abstractions, it's really easy to use, especially when working with APIs.
When we use the requests library to run a request, we get a request object that contains the data we want to extract as well as a requests status code. The status code informs us about the status of the request, and it is part of every request we make. Depending on the information they return, the codes are divided into hundreds of different values.
How to connect SQLite with Python?
a. We must import sqlite3 in order to use SQLite.
b. Then, using the connect method, make a connection and provide the name of the database you would like to access; if a file with that name exists, it will be opened. Python will create a file with the provided name if you don't specify one.
c. Following that, a cursor object is created that may send SQL commands. Cursor is a control structure for traversing and retrieving database records. When dealing with Python, the cursor is really important. The cursor object will be used to execute all commands.
d. Create an object as well as write the SQL statement in it with comments to create a table in the database. Example: - sql_comm = SQL statement.
e. And running the command is a breeze. Execute the cursor method, passing the name of the sql command as an argument. Save a list of commands as the sql_comm variable and run them. After you've completed all of your tasks, save the modifications to the file by committing them, and then disconnect.
Is Python good for databases?
Python is especially well suited for structured tabular data that can be obtained with SQL but then requires additional manipulation that would be difficult to accomplish with SQL alone.