Python is an all-time favourite language for all Data Science enthusiasts. The versatile nature and easy-to-understand approach help developers to focus more on understanding the trends in the data and deriving meaningful insights rather than spending time to fix a minor semicolon bug or closing the overhead bracket. Python being the most popular language among beginners is adapted quickly, so it becomes important to hold a good grasp of this language.Â
Data Structures is an essential concept in any programming language. It defines how the variables and data can be stored and retrieved from the memory in the best possible way, depending upon the data type. It also defines the relationship between variables, which helps in deciding the operations and functions that should be performed over them. Let’s understand how Python manages data.
Types of Data Structure in Python
1. List
This is the simplest and commonly used Data Structure in Python programming. As the name suggests, it is a collection of items to be stored. The items stored can be of any type numeric, string, boolean, objects, etc which makes it heterogeneous. This means that a list can have any type of data and we can iterate over this list using any type of loop.
The elements stored are usually associated with an index that defines the position in the list. The index numbering starts from zero. The list is mutable, meaning elements in the list can be added, removed, or changed even after their definition. This data structure is like arrays in other languages which is usually homogeneous, meaning only one type of data can be stored in arrays. Some basic operations on Lists are as below:
- To declare a list in Python, put it in the square brackets:
sample_list = [‘upGrad’, ‘1’, 2]
- To initialize an empty list:
sample_list = list()
- Add elements to the list:
sample_list.append(‘new_element’)Â
- Remove elements from the list:
sample_list.remove(<element name>) removes the specific element
del sample_list[<element index num>] removes the element at that index
sample_list.pop(<element index num>) removes the element of that index and returns that removed element
- To change element at any index:
sample_list[<any index>] = new item
- Slicing: This is an important feature that can filter out items in the list in particular instances. Consider that you require only a specific range of values from the list, then you can simply do this by:
sample_list[start: stop: step] where step defines the gap between the elements and by default it isÂ
Learn about: How to Create Perfect Decision Tree
2. Tuple
This is another data structure that sequentially stores data, meaning that the data added remains in an orderly fashion like the lists. Following the same lines, Tuple can also store heterogeneous data, and the indexing remains the same.
The major difference between the two is that the elements stored in the tuple is immutable and can’t be changed after definition. This means that you cannot add new elements, change existing items, or delete elements from the tuple. Elements can only be read from it via indexing or unpacking with no replacement.Â
This makes tuple fast as compared to the list in terms of creation. The tuple is stored in a single block of memory but a list requires two blocks, one is fixed-sized and the other is variable-sized for storing data. One should prefer a tuple over a list when the user is sure that the elements to be stored don’t require any further modification. Some things to consider while using a tuple:
- To initialize an empty tuple:
sample_tuple = tuple()
- To declare a tuple, enclose the items in circular brackets:
sample_tuple = (‘upGrad’, ‘Python’, ‘ML’, 23432)
- To access the elements of the tuple:
sample_tuple[<index_num>]Â
3. Sets
In mathematics, a set is a well-defined collection of unique elements that may or may not be related to each other. In tuple and list, one can store many duplicate elements with no-fail, but the set data structure only takes in unique items.
The elements of a set are stored in an unorderly fashion meaning the items are randomly stored in the set and there is no definite position or index supported, neither slicing is allowed in a set. The set is itself mutable but the elements must be immutable because the way sets work are hashing these elements and in this process, only immutable elements can be hashed.
Elements can be added or removed from the set but cannot be changed as there is no concept of indexing and therefore elements can be changed. Like in mathematics, here also all the set operations can be performed such as union, intersection, difference, disjoint. Let’s look at how to implement it:
- To initialize an empty set:
sample_set = set()
- Add elements to the set:
sample_set.add(item) This adds a single item to the set
sample_set.update(items) This can add multiple items via a list, tuple, or another set
- Remove elements from the set:
sample_set.discard(item) Removes element without warning if element not presentÂ
sample_set.remove(item) Raises an error if the element to be removed is not present.
- Set operations (Assume two sets initialized: A and B):
A | B or A.union(B):Â Union operationÂ
A & B or A.intersection(B): Intersection operationÂ
A – B or A.difference(B): Difference of two sets
A ^ B or A.symmetric_difference(B) : Symmetric difference of sets
Check out: Data Frames in Python
Explore our Popular Data Science Online Courses
upGrad’s Exclusive Data Science Webinar for you –
Transformation & Opportunities in Analytics & Insights
Top Data Science Skills to Learn to upskill
SL. No
Top Data Science Skills to Learn
1
Data Analysis Online Courses
Inferential Statistics Online Courses
2
Hypothesis Testing Online Courses
Logistic Regression Online Courses
3
Linear Regression Courses
Linear Algebra for Analysis Online Courses
4. Dictionary
This is the most useful data structure in Python, which allows the data elements to be stored in a key-value pair fashion. The key must be an immutable value, and the value can be a mutable item. This concept is like what an actual dictionary looks like, where we have the words as keys and their meanings as values. A dictionary stores these pairs in an unordered fashion, and therefore there is no concept of the index in this data structure. Some important things related to this:
- To initialize an empty dictionary:
sample_dict = dict()
- To add elements to the dictionary:
sample_dict[key] = valueÂ
Another way to do this is sample_dict = {key: value}
If you print this dictionary, the output would be: {‘key1’: value, ‘key2’: value … }
- To get the keys and values of the dictionary:
sample_dict.keys(): returns keys list
sample_dict.values(): returns values list
sample_dict.items(): returns the view object of key-value pairs as tuple in list
Learn data science courses from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.
Read our popular Data Science Articles
Conclusion
It’s important to grasp the basic knowledge of data structures in Python. Being in the Data industry, different Data Structures can help to get a better workaround of the underlying algorithms. It makes the developer more aware of the best coding practices to get the results efficiently. The usage of each data structure is highly situation based and requires rigorous practice.
Check out the trending Python Tutorial concepts in 2024
What is the importance of data structures?
Data structures are one of the foundational pillars of any programming language. They define how the data will be stored and manipulated in the memory. The concepts of data structures remain the same no matter which programming language we are talking about.
The most common data structures include arrays, lists, stacks, queues, trees, hashmaps, and graphs. Some of them are built-in while others need to be implemented by the user with the help of the pre-defined data structures.
How can I develop a strong grasp of data structures?
The fundamental concepts of the implementations and working of any data structure should be the first step you should take. After getting familiar with the theoretical concepts and working, you can start with the coding part.
You should always study the time complexities and space complexities of any algorithm or data structure that you are working upon. This will give you a proper insight into the concept, and you will be able to solve any question which requires that particular data structure.
When is a Python list preferred for storing data?
A list can be used to store various values with different data types and can be accessed just by their respective indices. When you need to perform mathematical operations over the elements, a list can be used since it allows you to mathematically operate the elements directly.
Since a list can be resized, it can be used to store the data when you are not certain about the number of elements to be stored.