Programs

Python Program for Merge Sort

As a multi-paradigm programming language with a structured, object-oriented design approach and simple and uncluttered syntax and grammar, Python is rapidly emerging as the language of choice for programmers working on projects of varying complexity and scale.

Python provides a modular library of pre-built algorithms that allows its users to perform various operations that may help them achieve their task in and of themselves or serve as a step along the way to achieving a larger, more complex goal. One of the more popular such algorithms is one that enables the Merge Sort functionality.

What is Merge Sort?

It is a general-purpose sorting technique that enables users to take a random dataset of any type and from any source and divides it into repetitive stages until eventually it is broken down into its individual components – a recursive technique, commonly referred to as the ‘divide and conquer’ method.

The algorithm then puts together the individual components – again in repetitive stages – but sorts them into a pre-decided, logical sequence at each stage along the way, using the basic comparison and swap until the entire data series is reconstituted in the desired logical sequence. 

Check out our other data science courses at upGrad. 

Divide and Conquer technique

Take, for instance, a random dataset of letters of the alphabet: N, H, V, B, Q, D, Z, R.

Step 1: The original dataset first gets broken down into two groups as follows: 

N, H, V, B Q, D, Z, R

Step 2: Both the resulting arrays get further sub-divided as follows:

N, H V, B Q, D Z, R

Step 3: Finally, all four arrays are further spit-up until the entire data series gets broken down into its individual components:

N H V B Q D Z R

The process then reverses, and the individual data points now begin to merge in a stage-wise manner. But over the course of this merging process, each element in each sub-array gets assesses and swapped so that they sort themselves out in a logical sequence (alphabetical order), as follows:

Step 4: Individual elements merge into pairs while swapping positions as required to form the correct sequence:

H, N B, V D, Q R, Z

Step 5: The recursive process of merging and sorting continues to the next iteration:

B, H, N, V D, Q, R, Z

Step 6: The entire data series is finally reconstituted in its logical alphabetical order:

B, D, H, N, Q, R, V, Z

Explore our Popular Data Science Courses

Merge Sort Implementations

There are two approaches to Merge Sort implementation in Python. The top-down approach and the bottom-up approach.

Top-down Approach:

The more commonly used top-down approach is the one described above. It takes longer and uses up more memory, and is therefore inefficient when working with smaller datasets. However, it is far more reliable, particularly when applied to large datasets. 

Read our popular Data Science Articles

Input code:

def merge_sort (inp_arr):
size = len(inp_arr)
if size > 1:
middle = size // 2
left_arr = inp_arr(:middle)
rIght_arr = inp_arr(middle:)
merge_sort(left_arr)
merge _sort(right_arr)
i = 0
j = 0
k = 0

(Where i and j are the iterators for traversing the left and right halves of the data series, respectively, and k is the iterator of the overall data series).

left_size = len(left_arr)
right _size = len(right_arr)
while i < left_size and j < right size:
if left_arr(i) < right_arr (j):
inp_arr(k) - left_arr(i)
i >= 1
else:
inp_arr(k) = right_arr (j)
j += 1
k += 1
while i < left_size:
inp_arr (k) = left_arr(i)
i += 1
k += 1
while j < right_size:
inp_arr (k) = right_arr(j)
j += 1
k += 1
inp_arr = (N, H, V, B, Q, D, Z, R)
print(:Input Array:\n”)
print(inp_arr)
merge_sort (inp_arr)
print(“Sorted Array:\n”)
print (inp_arr)

Output:

Input Array: N, H, V, B, Q, D, Z, R

Output Array: B, D, H, N, Q, R, V, Z

Bottom-up approach:

The bottom-up approach is quicker, uses up less memory, and works efficiently with smaller datasets but may run into problems when working with large data sets. It is therefore less-frequently used.

Input code: 

def merge(left, right):
result = [] x, y = 0, 0
for k in range(0, len(left) + len(right)):
if i == len(left): # if at the end of 1st half,
result.append(right[j]) # add all values of 2nd half
j += 1
elif j == len(right): # if at the end of 2nd half,
result.append(left[x]) # add all values of 1st half
i += 1
elif right[j] < left[i]:
result.append(right[j])
j += 1
else:
result.append(left[i])
i += 1
return result
def mergesort(ar_list):
length = len(ar_list)
size = 1
while size < length:
size+=size # initializes at 2 as described
for pos in range(0, length, size):
start = pos
mid  = pos + int(size / 2)
end = pos + size
left = ar_list[ start : mid ] right = ar_list[ mid : end ] 
ar_list[start:end] = merge(left, right)
return ar_list
ar_list = [N, H, V, B, Q, D, Z, R] print(mergesort(ar_list))

Output:

Input array: N, H, V, B, Q, D, Z, R

Output array: B, D, H, N, Q, R, V, Z

Merge Sort implementation applied to more complex, real-life datasets

Let’s apply the top-down approach to four random off-road vehicles in India:

Brand

Model

Ex-showroom price in Rs Crore

Jeep Wrangler 0.58
Ford  Endeavour  0.35
Jaguar Land Rover Range Rover Sport 2.42
Mercedes Benz G-class 1.76

Input code:

class Car:
def __init__(self, brand, model, price):
self.brand = brand
self.model = model
self.price = price
def __str__(self):
return str.format(“Brand: {}, Model: {}, Price: {}”, self.brand,
self.model, self.price)
def merge(list1, i, j, k, comp_fun):
left_copy = list1[i:k + 1]
r_sublist = list1[k+1:r+1]
left_copy_index = 0
j_sublist_index = 0
sorted_index = i
while left_copy_index < len(left_copy) and j_sublist_index <
len(j_sublist):
if comp_fun(left_copy[left_copy_index], j_sublist[j_sublist_index]):
list1[sorted_index] = left_copy[left_copy_index]
left_copy_index = left_copy_index + 1
else:
list1[sorted_index] = j_sublist[j_sublist_index]
j_sublist_index = j_sublist_index + 1
sorted_index = sorted_index + 1
while left_copy_index < len(left_copy):
list1[sorted_index] = left_copy[left_copy_index]
left_copy_index = left_copy_index + 1
sorted_index = sorted_index + 1
while j_sublist_index < len(j_sublist):
list1[sorted_index] = j_sublist[j_sublist_index]
j_sublist_index = j_sublist_index + 1
sorted_index = sorted_index + 1
def merge_sort(list1, i, j, comp_fun):
if i >= j:
return
k = (i + j)//2
merge_sort(list1, i, k, comp_fun)
merge_sort(list1, k + 1, j, comp_fun)
merge(list1,i, j, k, comp_fun)
car1 = Car(“Jeep”, “Wrangler”, 0.58)
car2 = Car(“Ford”, “Endeavour”, 0.35)
car3 = Car(“Jaguar Land Rover”, “Range Rover Sport”, 1.76)
car4 = Car(“Mercedes Benz”, “G-class”, 2.42)
list1 = [car1, car2, car3, car4]
merge_sort(list1, 0, len(list1) -1, lambda carA, carB: carA.brand < carB.brand)
print(“Cars sorted by brand:”)
for car in list1:
print(car)
print()
merge_sort(list1, 0, len(list1) -1, lambda carA, carB: carA.price< carB.price)
print(“Cars sorted by price:”)
for car in list1:
print(car)

Output:

Cars sorted by brand:

Ford Endeavour

Jaguar Land Rover Range Rover Sport

Jeep Wrangler

Mercedez Benz G-class

Cars sorted by price: 

Ford Endeavour

Jeep Wrangler

Jaguar Land Rover Range Rover

Mercedez Benz G-class

Understanding the Difference Between Insertion Sort and Merge Sort in Python

Insertion sort and merge sort in Python are often considered to be the same thing. But keep scrolling to understand the difference between the two algorithms.

  • Datasets: Merge sort is useful for large data sets. It can help compare the different elements inside an array. Therefore, it is not for small datasets. The insertion sort is more suitable when there are a limited number of elements. Insertion sort skips all the sorted values. Therefore, it works faster while dealing with already sorted or nearly sorted data. 
  • Stability: Merge sort is considered stable because of the presence of two elements with equal value. The values appear in the sorted output, similar to what they were in the unsorted input array. Insertion sort requires O(N2) time on arrays as well as linked lists. If the CPU comes with an efficient memory block move function, the array will be a lot faster. Otherwise, you wouldn’t find much of a time difference. 
  • Sorting Method: In the merge sort algorithm, the sorted data cannot be included within the memory. It requires auxiliary memory for sorting. Insertion sort stems from the idea that one element from the input elements gets consumed in every iteration to find the correct position. The correct position describes its place in a sorted array. 
  • Efficiency: If you compare the time complexity of the two algorithms, merge sort will prove to be better when it comes to time. But in terms of space, the insertion sort algorithm has an edge. 

An Example of the Insertion Sort Algorithm

#include<stdio.h>
void insertionSort(int arr[], int n) {
   int i, key, j;
   for (i = 1; i < n; i++) {
       key = arr[i];
       j = i - 1;
       while (j >= 0 && arr[j] > key) {
           arr[j + 1] = arr[j];
           j = j - 1;
       }
       arr[j + 1] = key;
   }
}
void printArray(int arr[], int n) {
   int i;
   for (i = 0; i < n; i++)
       printf("%d ", arr[i]);
   printf("\n");
}
int main() {
   int arr[] = {12, 11, 13, 5, 6};
   int n = sizeof(arr)/sizeof(arr[0]);
   printf("Given array is \n");
   printArray(arr, n);
   insertionSort(arr, n);
   printf("\nSorted array is \n");
   printArray(arr, n);
   return 0;
}

Output:

Given array is 

12 11 13 5 6 

Sorted array is 

5 6 11 12 13 

What is merge sort in Python using recursion?

The concept of merge sort in Python using recursion helps to sort the numbers on a big array. While using the recursion method, merge sort gets applied on the smaller arrays twice. It leads to the calling of the merge sort algorithm a total of four times. 

We keep passing on the problem but need to deliver a specific thing at a particular point. The stopping point where you deliver something is when you are required to sort an array with a single number. It is something that’s completely in your control while using the recursion method. 

What is the space and time complexity of the merge sort algorithm in Python?

The merge sort algorithm in Python has a time complexity of 0(nLogn) in the best, worst, as well as average cases. The time complexity can always create two halves from the array. The halves are then merged in linear time. For the space complexity, an extra array needs to be sorted to contain the resultant sorted array. Therefore, the space-time complexity can be defined as 0(n). 

You can learn both theoretical and practical aspects of Python with upGrad’s Professional Certificate in Data Science and Business Analytics from the University of Maryland. This course helps you learn Python from scratch. Even if you are new to programming and coding, upGrad will offer you a two-week preparatory course so that you can pick up on the basics of programming. you will learn about various tools like Python, SQL,, while working on multiple industry projects.

Want to share this article?

Plan your Software Development Career Now!

Leave a comment

Your email address will not be published. Required fields are marked *

Our Trending Cloud Computing Courses

Our Popular Software Engineering Courses

Get Free Consultation

Leave a comment

Your email address will not be published. Required fields are marked *

×
Get Free career counselling from upGrad experts!
Book a session with an industry professional today!
No Thanks
Let's do it
Get Free career counselling from upGrad experts!
Book a Session with an industry professional today!
Let's do it
No Thanks