Programs

LOC vs ILOC in Pandas: Difference Between LOC and ILOC in Pandas

Loc and iloc in Pandas

A common cause of confusion among new Python developers is loc vs. iloc. They both seem highly similar and perform similar tasks. So this can puzzle any student. 

If you want to find out the difference between iloc and loc, you’ve come to the right place, because in this article, we’ll discuss this topic in detail. You’ll find out what’s the key difference between these functions and then see them in action to understand the concept better. Checkout our data science courses to learn more about Pandas. 

Let’s get started. 

Difference Between loc and iloc

1. iloc in Python

You can use iloc in Python for selection. It is integer-location based and helps you select by the position. So, if you want to find the row with index 5, iloc will show you the fifth row of the data frame irrespective of its name or label. 

Here’s an example of iloc in Python:

>>> mydict = [{‘a’: 1, ‘b’: 2, ‘c’: 3, ‘d’: 4},

… {‘a’: 100, ‘b’: 200, ‘c’: 300, ‘d’: 400},

… {‘a’: 1000, ‘b’: 2000, ‘c’: 3000, ‘d’: 4000 }]

>>> df = pd.DataFrame(mydict)

>>> df

      a b c d

0 1 2 3 4

1 100 200 300 400

2 1000 2000 3000 4000

We’ll index the rows with a scalar integer.by using the iloc function for the above dataframe:

>>> type(df.iloc[0])

<class ‘pandas.core.series.Series’>

>>> df.iloc[0]

a 1

b 2

c 3

d 4

Name: 0, dtype: int64

2. loc in Pandas

You can use loc in Pandas to access multiple rows and columns by using labels; however, you can use it with a boolean array as well. 

If you use loc to find a row with index 5, you won’t get the fifth row with it. Instead, you will only get the row which has the name ‘5’. 

Here is an example of loc in Pandas:

>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],

… index=[‘cobra’, ‘viper’, ‘sidewinder’],

… columns=[‘max_speed’, ‘shield’])

>>> df

            max_speed shield

cobra 1 2

viper 4 5

sidewinder 7 8

The above was the table from which we’ll extract the row:

>>> df.loc[‘viper’]

max_speed 4

shield 5

Name: viper, dtype: int64

Detailed Example for loc vs iloc

Even though we use both of these functions for selection, it would be best if we discussed a detailed example to understand their distinctions. 

In our Example, we’ll use the telco customer dataset, which is available on kaggle. We’ll add it to a data frame:

df = pd.read_csv(“Projects/churn_prediction/Telco-Customer-Churn.csv”)

df.head ()

 

ID gender Sr.Citizen Partner Dependents tenure Phone MultipleLines Internet Security
0 7590-VHVEG Female 0 Yes No 1 No No Phone DSL No
1 5575-GNVDE Male 0 No No 34 Yes No DSL Yes
2 3668-QPYBK Male 0 No No 2 Yes No DSL Yes

 

This dataset has 21 columns; we’ve only shown a few for demonstration purposes. As we’ve already discussed, we use loc to select data by the label. Here, the names of the columns are their column labels, such as gender, tenure, OnlineSecurity; they all are the column names as well as the labels. 

As we haven’t assigned any specific index, pandas would create an integer index for the rows by default. The row labels are integers, which start at 0 and go up. In this example, we’ll see how loc and iloc behave differently.

  • Select row “1” and column “Partner”

df.loc[1, ‘Partner’]

Output: ‘No’

It shows the value present in the ‘Partner’ column of row ‘1’.

  • Select row labels ‘4’ and columns ‘customerID’ and ‘gender’

df.loc[:4, [‘customerID’, ‘gender’]]

 

customerID

gender

0

7590-VHVEG

Female

1

5575-GNVDE

Male

2

3668-QPYBK

Male

3

7795-CFOCW

Male

4

9237-HQITU

Female

  • Select row labels “1”, “2”, “3” and “Dependents” column

df.loc[[1,2,3], ‘Dependents’]

1 No

2 No

3 No

Name: Dependents, dtype: object

This time, we’ll filter the dataframe and apply iloc or loc:

  • Select row labels to “10” and “PhoneService” and “InternetService” columns of a customer that has a Partner (Partner should be ‘yes’)

df [df.Partner == ‘Yes’].loc:10, [‘PhoneService’, ‘InternetService’]]

In the case above, we applied a filter to the database but didn’t change the index so our output had omitted multiple labels of the rows which our filter required. So, by using loc[:10] here, we selected the rows that had labels up to “10”. 

If, on the other hand, we use iloc here and apply the filter, we will get 10 rows as iloc selects by position irrespective of the labels. Here’s the result we’ll get if we apply iloc[:10]:

df[df.Partner == ‘Yes’].iloc[:10, [6,8]]

 

PhoneService InternetService
0 No DSL
8 Yes Fiber optic
10 Yes DSL
12 Yes Fiber optic
15 Yes Fiber optic
18 Yes DSL
21 Yes No
23 Yes DSL
24 Yes DSL
26 Yes Fiber optic

You must’ve noticed that we have to change our method to select columns. 

Read: Python Pandas Tutorial

  • Select the first 5 columns and first 5 rows with iloc

df.iloc[:4, :4]

Explore our Popular Data Science Courses

 

customerID gender SeniorCitizen Partner
0 7590-VHVEG Female 0 Yes
1 5575-GNVDE Male 0 No
2 3668-QPYBK Male 0 No
3 7795-CFOCW Male 0 No

We can use iloc to select positions from the end. For that, we’ll simply have to use negative integers (-1, -2, etc.) and start with them.

  • Select the last 5 column and last 5 rows

df.iloc[-5:, -5:]

 

PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn
7038 Yes Mailed Check 84.80 1990.5 No
7039 Yes Credit Card 103.20 7362.9 No
7040 Yes Electronic check 29.60 346.45 No
7041 Yes Mailed check 74.40 306.6 Yes
7042 Yes Bank Transfer 105.65 6844.5 No

You can use the lambda function with iloc too. (A lambda function is a small anonymous function in Python which can have a single expression but any number of arguments)

  • Select every third row up to the 15th one and only show “internet service” and “Partner” columns

df.iloc[ lambda x: (x.index x 3 == 0) & (x.index <= 150][‘Partner’, ‘InternetService’ ]]

 

Partner InternetService
0 Yes DSL
3 No DSL
6 No Fiber optic
9 No DSL
12 Yes Fiber optic
15 Yes Fiber optic

We can also select labels or positions present in between.

  • Select the column positions between 4 and 6, and the row positions between 20 and 25

df.iloc[20:25, 4:6]

 

Dependents tenure
20 No 1
21 No 12
22 No 1
23 No 58
24 No 49

Now, if you’d try to pass labels to iloc, Pandas will show you the following error message:

ValueError: Location-based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types

You’ll get a similar error if you pass positions to loc. 

Also Read: Pandas Interview Questions

Top Data Science Skills to Learn in 2022

Learn More About Python

A student must ask questions and find their answers. We hope this article would have answered your questions on loc in Pandas (or iloc in Python). It would be best if you tried out these functions yourself on different datasets to understand how they work. 

If you want to learn more about Python, Pandas, and relevant topics, you should head to our blog. Our experts add multiple detailed resources there.

If you are curious to learn about data science, check out IIIT-B & upGrad’s Executive PG Programme in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.

How can we add rows of Pandas DataFrame?

To insert rows in the DataFrame, we can use the loc, iloc, and ix commands.

1. The loc is mostly used for our index's labels. It may be understood as when we insert in loc 4, which indicates we are seeking for DataFrame entries with an index of 4.
2. The iloc is mostly used to find locations in the index. It's as if we insert in iloc 4, which indicates we're searching for DataFrame entries that are present at index 4.
3. The ix case is complicated because we pass a label to ix if the index is integer-based. The ix 4 indicates that we are searching the DataFrame for values with an index of 4.

What is reindexing in the context of Pandas in Python?

A DataFrame's row and column labels get altered when we reindex it. The term 'reindex' refers to the process of aligning data to a specific set of labels along a single axis. In Pandas, reindexing can be used to alter the index of a DataFrame's rows and columns. Many index data structures connected with many pandas series or pandas DataFrame can be utilized with indexes.

What are some data operations in Pandas?

There are several important data operations for DataFrame in Pandas, which are as follows:

1. Selection of rows and columns - By passing the names of the rows and columns, we can select any row and column in the DataFrame. It becomes one-dimensional and is regarded as a series when you pick it from the DataFrame.
2. Data Filtering - By using some of the boolean expressions in DataFrame, we can filter the data.
3. Null Values - When no data is given to the items, they receive a Null value. There can be no values in the different columns, which are generally represented as NaN.

Want to share this article?

Prepare for a Career of the Future

Leave a comment

Your email address will not be published. Required fields are marked *

Leave a comment

Your email address will not be published. Required fields are marked *

×
Get Free career counselling from upGrad experts!
Book a session with an industry professional today!
No Thanks
Let's do it
Get Free career counselling from upGrad experts!
Book a Session with an industry professional today!
Let's do it
No Thanks