Programs

Ultimate Guide to Work with Excel Spreadsheets Using Python

Introduction to Using Python with Excel

Excel is a well-known tool for processing and visualising data. But large datasets can be time-consuming and challenging to work with in Excel. On the other hand, Python can automate many of the tasks that Excel cannot. It is a flexible language that can easily handle massive datasets and perform challenging calculations. Python and Excel work together to make reading, modifying, executing complicated computations and creating visualisations simple.

Setting Up Your Environment for Python and Excel

Python packages for Excel, such as OpenPyXL, XlsxWriter, and Pandas, are used to interact with Excel files. OpenPyXL enables users to read, create, and change Excel spreadsheets. Another tool that can be used to create new Excel files or edit existing ones is XlsxWriter. Powerful data analysis software Pandas can read and write Excel files, edit and analyse data, convert Pandas to Excel and provide visuals.

python packages in excel

Python has several libraries for interacting with Excel files. OpenPyXL, which can be installed via pip, is one of the most popular libraries. Python openpyxl is a package that allows Python programs to read, write, and manipulate Excel spreadsheets.

Here are the steps for configuring Python and Excel with OpenPyXL:

  1. Install OpenPyXL using pip:
pip install openpyxl

  1. Load the Excel workbook into Python using the load_workbook() function from OpenPyXL:
import openpyxl

workbook = openpyxl.load_workbook('example.xlsx')
  1. Using the active property or the get_sheet_by_name() function, you can navigate to a particular worksheet in the workbook:
worksheet = workbook.active

# or

worksheet = workbook.get_sheet_by_name('Sheet1')
  1. Use the cell() or iter_rows() methods to read data from the worksheet:
# Read a single cell

value = worksheet.cell(row=1, column=1).value

# Read a range of cells

for row in worksheet.iter_rows(min_row=1, max_row=10, min_col=1, max_col=3):

for cell in row:

print(cell.value)
  1. Write data to the worksheet using the cell() method:
worksheet.cell(row=1, column=1, value='Hello, world!')
  1. Save the changes to the worksheet using the save() method:
workbook.save('example.xlsx')

Reading and Writing Excel Files with Python

Python can write to Excel using various modules such as Pandas, xlwt, XlsxWriter, and openpyxl. Pandas writes to Excel easily, making it a powerful tool for data analysis that outshines Excel in many ways. Another well-known library that offers more sophisticated functionality for working with Excel files is Openpyxl.

Here are two examples of how to read an Excel file using Pandas:

import pandas as pd

Read the Excel file with a new variable name

data_frame = pd.read_excel('example.xlsx')

Print the contents of the file using the new variable

print(data_frame)
import pandas as pd

Create a DataFrame

data = {'Name': ['John', 'Emily', 'Michael'],

'Age': [32, 28, 40],

'Salary': [55000, 62000, 75000]}

df = pd.DataFrame(data)

Write the DataFrame to an Excel file

df.to_excel('new_example.xlsx', index=False)

This blog explores the fundamentals of using Python with Excel and making the most of both. You can also check out the Python Programming Bootcamp from upGrad to learn more about using Python to write Excel.

Read, Write Excel Spreadsheets with openpyxl

Here’s an example of how to read and write an Excel file using openpyxl:

from openpyxl import load_workbook

Read the Excel file

file_name = 'data.xlsx'

workbook = load_workbook(file_name)

worksheet = workbook.active

Print the contents of the file

for row in worksheet.iter_rows(values_only=True):

print(row)

 

Write to the Excel file

new_data = ['John', 35, 65000]

worksheet.append(new_data)

workbook.save(file_name)

This code imports the load_workbook() function from the openpyxl package and uses it to load the Excel file example.xlsx. The current worksheet is then retrieved using the action attribute. The file contents are sent to the console using a for loop that iterates over the worksheet's rows. Only the cell values should be sent, without any formatting or other information, according to the values_only=True option.

Manipulating Excel Data with Python

Python provides several libraries to manipulate Excel data, including openpyxl, pandas, and xlrd. Here’s an example of how to manipulate Excel data using openpyxl in Python:

from openpyxl import load_workbook

Read the Excel file

workbook = load_workbook('my_file.xlsx')

worksheet = workbook.active

Print the contents of the file

for row in worksheet.iter_rows(values_only=True):

print(row)
Write to the Excel file

new_row = ['John', 35, 75000]

worksheet.append(new_row)

workbook.save('my_file.xlsx')

Learn data science courses online from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.

Working with Formulas and Functions in Python

Python has methods and modules for working with formulas and performing mathematical operations.

In Python, the def keyword is followed by the function name and brackets containing any arguments the function takes. The body of the process is indented below the function definition.

Here’s a simple function that takes two inputs and returns their sum:

def sum_of_values(x, y):

return x + y

To call a function, just use its name followed by parentheses holding any arguments the function accepts. Here’s an example of calling the add_numbers() method mentioned above:

num1 = 5

num2 = 2

result = add_numbers(num1, num2)

print(result) # Output: 7

The Python programming language’s built-in math module also includes mathematical functions, such as sqrt() for calculating square roots and sin() for computing trigonometric functions. Before using the math module, it must be imported using the import keyword.

Here’s an example of using the math module to square an integer:

import math

number = 25

result = math.sqrt(number)

print(result) # Output: 5.0

Check out The Trending Python Tutorial Concepts in 2024

Automating Excel Tasks with Python

The general steps for automating Excel jobs using Python are as follows:

  • Install the required libraries, such as pandas, xlwings, and openpyxl.
  • To load the Excel workbook, use the openpyxl library.
  • Access the worksheet and read or change the data using the pandas or openpyxl libraries.
  • Save the changes you’ve made to the Excel spreadsheet.

Check out our free courses to get an edge over the competition.

Visualising Excel Data with Python

Various libraries for Excel data visualisation are available in Python, including Pandas, matplotlib, seaborn, Plotly, and Bokeh. Here is a brief explanation of how to display Excel data using Python’s pandas and Matplotlib:

Import the necessary libraries:

import pandas as pd

import matplotlib.pyplot as plt

Read the Excel file into a pandas DataFrame:

df = pd.read_excel(‘data.xlsx’)

Make a data visualisation using matplotlib and pandas. For instance, to represent the data in a bar chart:

df.plot(kind='bar', x='Category', y='Sales')

plt.show()

You can utilise Seaborn to generate more elaborate visualisations. For example, to produce a scatter plot of the data:

import seaborn as sns

sns.scatterplot(data=df, x='Sales', y='Profit', hue='Category')

plt.show()

Analysing Excel Data with Python

For studying Excel data, Python offers a variety of libraries, such as xlwings, openpyxl, and pandas.

Here’s how to analyse an Excel file and do some simple data analysis with pandas:

import pandas as pd

Read Excel file into a pandas DataFrame

df = pd.read_excel('new_data.xlsx')

Print the first 5 rows of the DataFrame

print(df.head())

Print the summary statistics of the DataFrame

print(df.describe())

Group the data by a different column and calculate the mean of another column

grouped = df.groupby('Group')['Revenue'].mean()

print(grouped)

This code imports an Excel file into a pandas DataFrame, publishes the first 5 rows and summary statistics of the DataFrame, groups the data by a defined column, calculates the mean of another column within each group, and finally outputs the resulting mean values.

Using Python for Advanced Excel Tasks

Here are some examples of advanced Excel tasks that can be performed using Python:

  • Using macros and scripts to automate repetitive operations in Excel: This assignment asks you to use Python to automate repetitive activities in Excel, such as data input and formatting. You can create scripts or macros that run automatically or use Python to communicate directly with Excel.
  • Integrating two different spreadsheets: Using Python tools like Pandas, this method reads Excel files into a DataFrame object. The data can then be modified and analysed using Python. Using the same techniques, you can rapidly export data from Python back into an Excel file.
  • Creating complex Excel workbooks: For this project, you must use Python libraries like OpenPyXL and XlsxWriter to construct sophisticated Excel workbooks with custom formatting, charts, and equations. Existing Excel spreadsheets can also be read and modified using Python.

Integrating Python with Excel Online

Python can be integrated with Excel Online via a variety of techniques. Using a Python client in combination with the Excel Online API provides an option. Another option is to utilise the open-source xlwings application, which automates Excel using Python on Windows and macOS.

To use the Excel Online API with Python, you can use the Python client to read and edit Excel files stored in OneDrive or SharePoint Online. The API includes Excel file reading, writing, updating functionalities and creating and deleting files and folders. You can submit HTTP queries to API endpoints and read the JSON replies using Python’s requests library.

User-defined functions (UDFs) and Excel macros written in Python can be used with xlwings. Google Sheets, Excel on the Web, and Excel on Windows and macOS can all be used with xlwings. It is easy to set up, offers a wide range of reporting possibilities, and features a rapid file reader without needing an Excel installation.

Using xlwings, you can automate Excel reports using Python or do interactive data analysis using NumPy, Jupyter Notebooks, Pandas, scikit-learn, and other Python-based tools.

Read our popular Data Science Articles

Working with Excel Add-ins and Macros in Python

Python offers a variety of packages and tools to deal with Excel add-ins and macros. Popular programs include openpyxl, xlwings, and pyXLL.

Python programmers can write fully working Excel add-ins using PyXLL, an Excel add-in. Users can run Python inside the spreadsheet application by integrating Python into Excel and adding custom Python code to enhance its functionality.

Using PyXLL, it is possible to call Python functions directly from Excel, stream real-time data from Python to Excel, swap out VBA code for Python, and add customised menus and ribbon toolbars to Excel.

Python can automate Excel on Windows and macOS using the open-source programme xlwings. xlwings is simple to install, offers strong reporting features, and a straightforward API. It works with Excel on the Web, Google Sheets, and Excel on Windows and macOS, and is an alternative to VBA macros/UDFs and Power Query.

Converting Excel Data to Other Formats with Python

Python provides several libraries to convert Excel data to other formats such as CSV, JSON, HTML, and PDF.

Here are some ways to convert Excel data to other formats using Python:

Using the xlrd and CSV libraries: The xlrd library’s primary function is to read Excel files. The CSV library allows you to read and write CSV files. These libraries can be used to convert Excel data to CSV format.

import xlrd

import csv

# Open Excel file

workbook = xlrd.open_workbook('new_input_file.xlsx')

worksheet = workbook.sheet_by_index(1)

# Create CSV file

with open('new_output_file.csv', 'w', newline='') as csv_file:

csv_writer = csv.writer(csv_file, delimiter=',')

for row_num in range(worksheet.nrows):

csv_writer.writerow(worksheet.row_values(row_num))

Using the Pandas library: Pandas is an open-source software library developed for Python for data manipulation and analysis. It offers several capabilities for handling numerical tables and time series in terms of data formats and operations. Both tiny and large datasets can be read, filtered, and rearranged. Data from Excel can be converted to formats like CSV, JSON, and HTML using Pandas.

import pandas as pd

Read Excel file

df_new = pd.read_excel(‘input_file_new.xlsx’)

Convert to CSV

df_new.to_csv(‘output_file_new.csv’, index=False)

Convert to JSON

df_new.to_json(‘output_file_new.json’, orient=’records’)

Convert to HTML

df_new.to_html(‘output_file_new.html’, index=False)

Making use of Aspose.Cells database: Aspose.Excel files can be created, modified, and converted using the Python and Java API Cells. With only a few lines of Python code, it can convert Excel data to PDF, XPS, HTML, JPEG, and many more popular formats.

import asposecellscloud

from asposecellscloud.apis.cells_api import CellsApi

# Set up Aspose.Cells Cloud API credentials

configuration = asposecellscloud.Configuration()

configuration.app_sid = 'APP_SID'

configuration.app_key = 'APP_KEY'

# Initialize the API client

api_client = asposecellscloud.ApiClient(configuration)

api = CellsApi(api_client)

# Convert Excel file to PDF

input_file = 'input_file.xlsx'

output_file = 'output_file.pdf'

response = api.cells_save_as_pdf(input_file, output_file)

with open(output_file, 'wb') as f:

f.write(response)

Best Python Libraries for Working with Excel

Here is a table of some of the best Python libraries for working with Excel:

Library Description
openpyxl A widely used package for working with Excel files in Python. It is designed to read and write Excel 2010 files with xlsx, xlsm, and xltx formats.
XlsxWriter A Python module for writing files in the Excel 2007+ XLSX file format. It supports features such as formatting and charting.
xlwings A Python Excel library allows you to automate Excel from Python and vice versa. It supports reading and writing Excel files, formatting, and charting.
xlrd A Python Excel tool allows you to read data from Excel files. It accepts XLS and XLSX formats for getting data and editing information from Excel files.
pandas A Python tool that offers ways and data structures for dealing with organised data, like Excel files. It offers editing and graphing in addition to reading and writing Excel files.

Common Mistakes and How to Avoid

Here are some common mistakes to avoid when working with Excel using Python:

  • Not verifying input from the program, even in Excel files, iterating through a list and making changes to it. It is possible to convert a list to Excel in Python and create a meaningful spreadsheet.
  • A straightforward approach to reading a spreadsheet
  • Using a library-free Excel file
  • Not verifying the value type of a cell
  • The xlrd, xlwt, or xlutils libraries cannot be used to manage Excel files.
  • Refusing to transform the data into more suitable Python structures
  • Not determining the type of a cell’s value using Python’s assistance function.
  • Ignoring the fact that certain data types were intended to be handled via a graphical user interface
  • Not using a set of Excel tools for more complex procedures

Top Data Science Skills to Learn

Tips and Best Practices

Here are some tips and best practices for working with Excel using Python:

  • Use the openpyxl package to handle spreadsheets in Python
  • Pip is used to install openpyxl. cmd
  • Use the load_workbook() function to read Excel files
  • Use active to choose the first accessible sheet and cell attribute to select the cell by providing the row and column parameters
  • Convert Excel data to more appropriate Python structures
  • Use Pandas and openpyxl to read and import Excel files
  • Create prediction models, produce analytical reports, and do mathematical computations using Python and Excel
  • Check the quality of your spreadsheet application using a checklist
  • Learn more about interacting with Excel in Python by taking an online course

Explore our Popular Data Science Courses

Conclusion

Python provides a powerful and flexible way to work with Excel, allowing data professionals to automate repetitive tasks, analyse large datasets, and create reports and visualisations efficiently.

The upGrad Master of Science in Data Science from LJMU programme covers various topics, including statistics, Python-based predictive analytics, machine learning, data visualisation, big data, and NLP. Introduction to Python and Python for Data Science are among the topics covered in the course. upGrad also offers an Executive PG Programme in Data Science from IIIT Bangalore. The programme is academically rigorous and industrially relevant, covering the topic in-depth and breadth.

Sign up today and upskill by learning how to use Python with Excel.

FAQs

How can one sort data in an Excel file using Python?

Python sort values function can be used to order data in an Excel file according to one or more columns after reading the Excel file into a DataFrame.

2. How can one filter data in an Excel file using Python?

After reading the Excel file using Pandas, one can filter the data with expressions or loc before saving the filtered data to a new sheet or file.

3. How can one create charts in an Excel file using Python?

To construct charts in an Python write to excel, use the plot method from the Pandas library or the add chart function from the openpyxl package. The plot function can generate charts in an Excel file based on data in a Pandas DataFrame, while the add chart technique can generate charts in an Excel file depending on a defined chart type and data range.

Want to share this article?

Leave a comment

Your email address will not be published. Required fields are marked *

Our Popular Data Science Course

Get Free Consultation

Leave a comment

Your email address will not be published. Required fields are marked *

×
Get Free career counselling from upGrad experts!
Book a session with an industry professional today!
No Thanks
Let's do it
Get Free career counselling from upGrad experts!
Book a Session with an industry professional today!
Let's do it
No Thanks