Data analysis has become a new genre of study, and all thanks to Python. If you are an enthusiast data analyst who works on Python almost absolutely use the Pandas library, then this article is for you. This Pandas cheatsheet will go through all the essential methods that come in handy while analyzing data.
You might have encountered situations where it is hard to remember the specific syntax for doing something in Pandas. These Pandas cheat sheet commands will help you easily remember and reference the most common Pandas operations. If you are a beginner in python and data science, upGrad’s data science courses can definitely help you dive deeper into the world of data and analytics.
Using the Pandas Cheatsheet
Before using this Pandas cheat sheet, you should thoroughly learn Pandas Tutorial and then refer to this cheat sheet for remembering and clearance. Pandas cheat sheet will help you quickly look for methods you have already learned, and it can come in handy even if you are going for an exam or interview. We have collected and grouped all the commands used frequently in the Pandas by a data analyst for easy detection. In this Pandas cheat sheet, we will use the following shorthand for representing different objects.
- df: For representing any Pandas DataFrame object
- ser: For representing any Pandas Series object
You have to use these following relevant libraries for implementing the methods mentioned below in this article.
- import pandas as pd
- import numpy as np
Must Read: Pandas Interview Questions
1. Import data from different files
- To read all data from a CSV file: pd.read_csv(file_name)
- To read all data from a delimited text file (like TSV): pd.read_table(file_name)
- To read from an Excel sheet: pd.read_excel(file_name)
- To read data from a SQL database: pd.read_sql(query, connectionObject)
- Fetching the data from a JSON formatted string or URL: pd.read_json(jsonString)
- To take the contents of your clipboard: pd.read_clipboard()
2. Export DataFrames in different file formats
- To write a DataFrame to a CSV file: df.to_csv(file_name)
- To write a DataFrame to an Excel file: df.to_excel(file_name)
- To write a DataFrame to a SQL table: df.to_sql(tableName, connectionObject)
- To write a DataFrame to a file in JSON format: df.to_json(file_name)
3. Inspect a particular section of your DataFrame or Series
- To fetch all the information related to index, datatype, and memory: df.info()
- To extract the starting ‘n’ rows of your DataFrame: df.head(n)
- To extract the ending ‘n’ rows of your DataFrame: df.tail(n)
- To extract the number of rows and columns available in your DataFrame: df.shape
- To summarize the statistics for numerical columns: df.describe()
- To view unique values along with their counts: ser.value_counts(dropna=False)
4. Selecting a specific subset of your data
- Extract the first row: df.iloc[0,:]
- To extract the first element of your DataFrame’s first column: df.iloc[0,0]
- To return columns having label ‘col’ as Series: df[col]
- To return columns having a new DataFrame: df[[col1,col2]]
- To select data by position: ser.iloc[0]
- To select data by index: ser.loc[‘index_one’]
upGrad’s Exclusive Data Science Webinar for you –
ODE Thought Leadership Presentation
5. Data Cleaning Commands
- To rename columns in masses: df.rename(columns = lambda x: x + 1)
- To rename columns selectively: df.rename(columns = {‘oldName’: ‘newName’})
- To rename the index in masses: df.rename(index = lambda x: x + 1)
- To rename columns in sequence: df.columns = [‘x’, ‘y’, ‘z’]
- To check if null values exists, returns a boolean arrray accordingly: pd.isnull()
- The reverse of pd.isnull(): pd.notnull()
- Drops all rows containing null values: df.dropna()
- Drops all columns containing null values: df.dropna(axis=1)
- To replace each null value with ‘n’: df.fillna(n)
- To convert all the datatypes of the series into float: ser.astype(float)
- To replace all numbered 1 with ‘one’ and 3 with ‘three’: ser.replace([1,2], [‘one’,’two’])
Also Read: Pandas Dataframe Astype
Explore our Popular Data Science Courses
6. Groupby, Sort, and Filter Data
- To return a groupby object for column values: df.groupby(colm)
- To return groupby object for multiple column values: df.groupby([colm1, colm2])
- To sort values in ascending order (by column): df.sort_values(colm1)
- To sort values in descending order (by column): df.sort_values(colm2, ascending=False)
- Extract rows where the column value is greater than 0.6: df[df[colm] > 0.6]
Read our popular Data Science Articles
7. Others
- Add the rows of the first DataFrame to the end of the second DataFrame: df1.append(df2)
- Add the columns of the first DataFrame to the end of the second DataFrame: pd.concat([df1,df2],axis=1)
- To return the mean of all columns: df.mean()
- To return the number of non-null values: df.count()
Top Data Science Skills to Learn
SL. No
Top Data Science Skills to Learn
1
Data Analysis Programs
Inferential Statistics Programs
2
Hypothesis Testing Programs
Logistic Regression Programs
3
Linear Regression Programs
Linear Algebra for Analysis Programs
Conclusion
These Pandas cheat sheets will be useful only for rapid recall. It is always a good approach to practice the commands before directly jumping into the Pandas cheat sheet.
If you are curious to learn about Pandas, check out IIIT-B & upGrad’s Executive PG Programme in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.
What are the salient features of Pandas libraries?
The following are the features that make Pandas one of the most popular Python libraries: Pandas provides us with various data frames that not only allow efficient data representation but also enable us to manipulate it. It provides efficient alignment and indexing features that provide intelligent ways of labelling and organizing the data. Some features of Pandas make the code clean and increase its readability, thus making it more efficient. It can also read multiple file formats. JSON, CSV, HDF5, and Excel are some of the file formats supported by Pandas. The merging of multiple datasets has been a real challenge for many programmers. Pandas overcome this too and merge multiple data sets very efficiently. Pandas library also provides access to other important Python libraries like Matplotlib and NumPy which makes it a highly efficient library.
What are the other libraries and tools that complement Pandas library?
Pandas not only works as a central library for creating data frames, but it also works with other libraries and tools of Python to be more efficient. Pandas is built on the NumPy Python package which indicates that most of Pandas library structure is replicated from the NumPy package. Statistical analysis on the data in Pandas library is operated by SciPy, plotting functions on Matplotlib, and machine learning algorithms in Scikit-learn. Jupyter Notebook is a web-based interactive environment that works as an IDE and offers a good environment for Pandas.
State the basic operations of the data frame
Selecting an index or a column before starting any operation like addition or deletion is important. Once you learn how to access values and select columns from a Data Frame, you can learn to add index, row, or column in a Pandas Dataframe. If the index in the data frame does not come out to be as you desired, you can reset it. For resetting the index, you can use the “reset_index()” function.