Programs

# Introduction to Markov Chains: Prerequisites, Properties & Applications

Has it ever crossed your mind how expert meteorologists make a precise prediction of the weather or how Google ranks different web pages? How they make the fascinating python applications in real world. These calculations are complex and involve several variables that are dynamic and can be solved using probability estimates.

When Google introduced its PageRank algorithm, it revolutionized the web industry. And if you’re familiar with that algorithm, you must also know it uses Markov chains. In our introduction to Markov chains, we’ll briefly examine them and understand what they are. So, let’s get started.

Check out our data science certifications to upskill yourself

## Pre-requisites

It’s essential to know a few concepts before we start the introduction to Markov chains. And most of them are from probability theory. Non-mathematically, you can define a random variable’s value as the result of a random event. So, for example, if the variable were the result of rolling a die, it would be a number whereas if it were a result of a coin flip, it would be a boolean (0 or 1). The set of these possible results could be continuous as well as discrete.

So we can say that a stochastic process is a collection of random variables that set indexes. That set represents different time instances. This set could be of real numbers (continuous process) or natural numbers (discrete process).

## Introduction to Markov Chains

Markov chains get their name from Andrey Markov, who had brought up this concept for the first time in 1906. Markov chains refer to stochastic processes that contain random variables, and those variables transition from a state to another according to probability rules and assumptions.

### What is the Markov Property?

There are plenty of groups of random processes, such as autoregressive models and Gaussian processes. Markov property makes the study of these random processes quite easier. A Markov property states that we wouldn’t get more information about the future outcomes of a process by increasing our knowledge about its past if we know its value at a particular time.

A more elaborate definition would be: Markov property says that the probability of a stochastic process only depends on its current state and time, and it is independent of the other states it had before. That’s why it’s a memoryless property as it only depends on the present state of the process.

A homogeneous discrete-time Markov chain is a Marko process that has discrete state space and time. We can say that a Markov chain is a discrete series of states, and it possesses the Markov property.

Here’s the mathematical representation of a Markov chain:

X = (Xn)nN=(X0, X1, X2, …)

## Properties of Markov Chains

Let’s take a look at the fundamental features of Markov chains to understand them better. We won’t delve too deep on this topic as the purpose of this article is to make you familiar with the general concept of Markov chains.

### Reducibility

Markov chains are irreducible. That means they have no reducibility if it can reach any state from another state. The chain doesn’t need to reach one state from another in just a single time step; it can do so in multiple time steps. If we can represent the chain with a graph, then the graph would be firmly connected.

## Explore our Popular Data Science Online Certifications

 Executive Post Graduate Programme in Data Science from IIITB Professional Certificate Program in Data Science for Business Decision Making Master of Science in Data Science from University of Arizona Advanced Certificate Programme in Data Science from IIITB Professional Certificate Program in Data Science and Business Analytics from University of Maryland Data Science Online Certifications

### Aperiodic

Let’s say a state’s period is k. If k = 1, then this state is aperiodic when any kind of return to its state requires some multiple of k time-steps. When all states of a Markov chain are aperiodic, then we can say that the Markov chain is aperiodic.

## Top Data Science Skills You Should Learn

 SL. No Top Data Science Skills to Learn 1 Data Analysis Online Certification Inferential Statistics Online Certification 2 Hypothesis Testing Online Certification Logistic Regression Online Certification 3 Linear Regression Certification Linear Algebra for Analysis Online Certification

### Transient and Recurrent States

When you leave a state, and there’s a probability that you can’t return to it, we say that the state is transient. On the other hand, if we can return to a state with probability 1, after we have left it, we can say that the property is recurrent.

There are two kinds of recurrent states we can have. The first one is the positive recurrent state with a finite expected return time, and the second one is the null recurrent state with an infinite expected return time. Expected return time refers to the mean recurrence time when we leave the state.

## Read our popular Data Science Articles

 Data Science Career Path: A Comprehensive Career Guide Data Science Career Growth: The Future of Work is here Why is Data Science Important? 8 Ways Data Science Brings Value to the Business Relevance of Data Science for Managers The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have Top 6 Reasons Why You Should Become a Data Scientist A Day in the Life of Data Scientist: What do they do? Myth Busted: Data Science doesn’t need Coding Business Intelligence vs Data Science: What are the differences?

## Higher-order Markov Chains

Higher-order Markov chains are an extension of the standard introduction to Markov chains, where the probability of transitioning from one state to another depends not only on the current state but also on a fixed number of preceding states, in contrast to first-order Markov chains, which only consider the immediately previous state, higher-order Markov chains incorporate a history of states to determine the transition probabilities. This allows for more sophisticated modeling of systems with dependencies that span beyond the immediate past.

## Formal Definition

In a higher-order Markov chain, the state of the system at a time *t* depends on the *n* preceding states, denoted as *X(t-1), X(t-2), …, X(t-n)*, where *X(t)* represents the state at a time *t*. The transition probabilities in a higher-order Markov chain are defined as follows:

P(X(t) = x | X(t-1) = x_{t-1}, X(t-2) = x_{t-2}, …, X(t-n) = x_{t-n})

## Examples of Higher-order Markov Chains

1. Language Modeling: In natural language processing, language models often use higher-order Markov chains to predict the probability of a word based on the context of the preceding *n* words. This enables the generation of more contextually relevant and coherent sentences.
2. Weather Prediction: Weather forecasting models can utilize higher-order Markov chains to predict weather conditions based on the historical weather patterns of the past *n* days. This approach can capture longer-term climate dependencies and improve the accuracy of predictions.

## Challenges and Considerations

While higher-order Markov chains offer increased modeling capabilities, they also present some challenges:

### 1. Increased Dimensionality

As the order of the Markov chain (*n*) increases, the number of possible combinations of states in history increases exponentially. This can lead to a significant increase in model complexity and computational requirements.

### 2. Data Sparsity

In many applications, the higher-order state combinations may not occur frequently in the training data, resulting in sparse observations. This can lead to unreliable estimates of transition probabilities, affecting the model’s performance.

### 3. Curse of Dimensionality

As the order of the Markov chain increases, the size of the state space grows exponentially. This phenomenon is known as the “curse of dimensionality.” With a larger state space, the amount of data required to estimate transition probabilities accurately becomes impractical, especially when dealing with real-world applications. As the number of possible state combinations grows, the available data may become sparse, making it difficult to build reliable models.

### 4. Memory Requirements

Higher-order Markov chains require storing and manipulating historical state information. As the order (*n*) increases, the model needs to maintain a more extended history of states, which can lead to increased memory requirements. This becomes particularly challenging when dealing with massive datasets or resource-constrained environments, as retaining and processing such large historical sequences might not be feasible.

### 5. Model Overfitting

Higher-order Markov chains are susceptible to overfitting, especially when the order (*n*) is large, and the available data is limited. Overfitting occurs when the model captures noise and random variations in the training data rather than learning the underlying patterns.

## Methods for Estimation

To address the challenges of higher-order Markov chains, various estimation techniques have been developed:

### 1. Maximum Likelihood Estimation (MLE)

MLE is commonly used to estimate transition probabilities based on observed data. However, in higher-order Markov chains, the scarcity of certain state combinations can lead to unreliable estimates.

### 2. Smoothing Techniques

Smoothing methods, such as Laplace smoothing or add-k smoothing, can be applied to alleviate the problem of data sparsity and provide more robust estimates of transition probabilities.

upGrad’s Exclusive Data Science Webinar for you –

## Applications of Markov Chains

Introduction to Markov chains finds applications in many areas. Here are their prominent applications:

• Google’s PageRank algorithm treats the web like a Markov model. You can say that all the web pages are states, and the links between them are transitions possessing specific probabilities. In other words, we can say that no matter what you’re searching on Google, there’s a finite probability of you ending up on a particular web page.
• If you use Gmail, you must’ve noticed their Auto-fill feature. This feature automatically predicts your sentences to help you write emails quickly. Markov chains help in this sector considerably as they can provide predictions of this sort effectively.
• Have you heard of Reddit? It’s a significant social-media platform that’s filled with subreddits (a name for communities in Reddit) of specific topics. Reddit uses Markov chains and models to simulate subreddits for a better understanding of the same.

## Final Thoughts

It appears we have reached the end of our introduction to Markov chains. We hope you found this article useful. If you have any questions or queries, feel free to share them with us through the comments. We’d love to hear from you.

If you are curious to learn about data science, check out IIIT-B & upGrad’s PG Diploma in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.

## Is there any real life application of Markov Chains?

One of the most essential tests for dealing with separate trial procedures is the Markov chain. In finance and economics, Markov chains are used to represent a variety of events, such as market crashes and asset values. Markov chains are applied in a wide range of academic areas, including biology, economics, and even real-world scenarios. Parking lots have a set number of spots available, but how many are available at any one moment may be characterized using a Markov model based on a combination of numerous factors or variables. Markov chains are frequently used to create dummy texts, lengthy articles, and speeches.

## What does the term equilibrium mean with respect to Markov Chains?

The distribution πT is said to be an equilibrium distribution If πT P = πT. Equilibrium refers to a situation where the distribution of Xt does not change as we progress through the Markov chain. In fact, the distinguishing feature of a Markov chain is that the potential future states are fixed, regardless of how the process got to its current state. In other words, the likelihood of transitioning to any given condition is completely determined by the present state and the amount of time that has passed.

## Are Markov Chains time homogenous?

If the transition probability between two given state values at any two times relies only on the difference between those times, the process is time homogenous. There are conditions for a Markov chain to be homogeneous or non-homogeneous. The transition probabilities of a Markov chain are said to be homogenous if and only if they are independent of time. The Markov property is retained in non-homogeneous Markov chains (nhmc), although the transition probabilities may vary with time. This section lays forth the criteria that guarantee the presence of a variation limit in such chains, with the goal of applying them to simulated annealing.