Introduction to Probability and Probability Distribution
In order to understand probability distribution, let us first understand what probability is. Probability is the measure of the likelihood of an event occurring in an experiment. In simple terms, it tells us how likely is it that the event will occur. The value of the probability of an event occurring ranges from 0 (being least probable) to 1 (being most probable).
The probability distribution is a function that provides the probabilities of different outcomes for experimentation. It shows the possible values that a random variable can take and how often do these values occur.
In probability distribution, the sum of all these probabilities always aggregates to 1. In the data science domain, one of the usages of the probability distribution is for calculating confidence intervals and for calculating the critical regions in the hypothesis tests.
Continuous and Discrete Distributions
The type of probability distribution to be used depends upon whether the variable contains discrete values or continuous values. A discrete distribution can only take a limited set of values whereas continuous distributions can take in any value within the specified range.
The continuous distributions are represented in terms of probability density as there can be infinite values in a certain range and the probability of each value will be zero. In the case of discrete distribution, we can obtain a probability for each value as the number of values is limited.
Types of Distributions – Discrete Distribution
It is a type of distribution where the number of outcomes in a single trial is only two. Each trial is independent of another trial; that is, the outcome of each trial does not have an impact on the outcome of other trials. The trials that are conducted in this experiment are identical to each other.
Thus, the probability of success and failure would be the same for each trial. For example, if the probability of success for a trial is 0.8 (which means the probability of failure would be 0.2), then it will be the same for the rest of the trials as well.
Multi nominal Distribution
This is the generalized version of binomial distribution where the number of outcomes can be greater than two. The other properties of this distribution are similar to that of the binomial distribution. For example, consider when a fair die is rolled, the probability of each outcome is going to be the same for all trials as these trials are independent of each other.
This is another variant of Binomial distribution. It is a special case of Binomial distribution where the number of trials conducted in an experiment is 1 (n = 1). As there is only one trial, it can be defined using only one parameter (p) which is generally the probability of success.
Negative Binomial Distribution
The following conditions in a negative binomial distribution differ from the binomial distribution: –
- The number of trials conducted in an experiment is not fixed.
- The random variable indicates the number of trials required to attain a desired number of successes.
For binomial distribution, the random variable is the number of successes required i.e. We focus only on the number of successes no matter how many trails fail. But in the case of negative binomial, it focuses on how many trials will be required for achieving the number of successes i.e. The number of failures (negatives) is also brought into consideration which is why it is called a negative binomial distribution.
The process is continued only till the desired number of successes have been attained. This causes the number of trials for an experiment to be arbitrary. It is also called Pascal Distribution.
Poisson Distribution provides the probability of a discrete number of events occurring in a specific period of time, provided we know the average number of events that occurred during the same period. These events occur independently and have no effect over other events. For implementing this distribution, it assumes that the rate of occurrence remains constant over the time period.
Discrete Uniform Distribution
In uniform distribution, the probabilities of all the outcomes are equal. For example, consider when a fair die is rolled, the probability of any outcome ranging from 1 to 6 is going to be equal. The probability mass function of this distribution is 1/n where n is the total number of discrete values.
Types of Distributions – Continuous Distribution
Continuous Uniform Distribution
The uniformity in the distribution can be applied to continuous values as well. It indicates that the probability distribution is uniform between the specified range. It is also called a rectangular distribution due to the shape it takes when plotted on a graph.
A normal distribution (also known as a bell curve) is a type of continuous distribution that is symmetrical from both the ends of the mean. It generally indicates the one-half of the samples lie on the left side of the mean, while the other half lies on the right side. For a normal distribution, the mean, the mode, and the median are equal.
Normally distributed data generally follow the empirical rule. The empirical rule shows the spread of the data in terms of standard deviation and mean as follows: –
- 68% probability that the random variable falls within 1 standard deviation of the mean.
- 95% probability that the random variable falls within 2 standard deviations of the mean.
- 99.7% probability that the random variable falls within 3 standard deviations of the mean.
T – Distribution
It is similar to a normal distribution, but it has a higher probability towards the extreme values of the data. This makes it more liable to take values that are farther from the mean. When plotted on a graph, the curve seems shorter and fatter than the normal distribution curve.
It is preferred when the number of samples is smaller in size. With the increase in the size of samples, the t-distribution curve starts to appear like a normal distribution curve. As the formulae for normal distribution and t- distribution are very complex and time-consuming to calculate, we instead compute the values of Z-score and T-score respectively.
Chi – Square Distribution
Chi-square distribution is the distribution of the summation of the square of the random variables taken from a normal distribution. The degrees of freedom used in this distribution is equal to the number of variables taken from the normal distribution. The mean of a chi-square distribution is equal to the number of degrees of freedom.
This distribution is widely used in calculating the confidence intervals and in hypothesis testing. It is a specific case of gamma distribution. It is also used in the chi-square test which is the goodness of fit test for observed distribution which helps in indicating if the sample data is a good representation of the entire population.
This article gave an overview of a few examples of discrete and continuous types of distributions. These different distributions are used to serve different purposes, and each has its own assumptions.
Learn ML Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.
Although in real-life situations, the assumptions of these distributions might not be fulfilled, but these distributions do assist in making important decisions for the organization.
If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.
What distinguishes the binomial distribution from the normal distribution?
In a binomial distribution, there are no data points between any two given data points. This is in stark contrast to a normal distribution, which features discrete data points. A normal distribution is not discrete unlike the binomial distribution. A binomial distribution has a finite number of occurrences, whereas a normal distribution has an infinite number of occurrences. Even then, if the sample size is large enough, the form of the binomial distribution will resemble that of the normal distribution.
What distinguishes the binomial distribution from the Bernoulli distribution?
The outcome of a single trial of an event is dealt with by the Bernoulli distribution, but the outcome of several trials of a single event is dealt with by the Binomial distribution. When the result of an event is required just once, the Bernoulli distribution is applied, but the Binomial distribution is used when the outcome is required several times.
When there is uncertainty, how can we use probability distribution?
A probability space is a representation of our uncertainty about an experiment that includes a sample space of possible outcomes and a probability measure that estimates the likelihood of each event. In uncertainty analysis, the rectangular distribution is the most widely employed probability distribution. All outcomes are equally likely to occur in a rectangular distribution. You will have to divide your values by the square-root of 3 to convert your uncertainty contributors to standard deviation equivalents.