Introduction to Measures of Dispersion in Statistics
Dispersion in statistics refers to the state of data being scattered. In the language of statistics, dispersion describes the extent to which a data set is spread out with respect to an average value. Central tendency and dispersion in statistics are used together to determine the data distribution.
The spread of data can be determined using a variety of statistical measures of dispersion, such as standard deviation, variance, interquartile range, etc. These methods find use in several industries. For instance, companies use variance analysis to identify the gaps between the standard and actual costs incurred, which further helps them control the costs.
The measures of dispersion in statistics can be broadly classified into two types- absolute and relative.
This article discusses in detail what is dispersion in statistics and what is measure of dispersion with their examples.
Read on to find out!
What is Dispersion, and Why is it Important?
Dispersion in statistics helps understand the reliability of the average. If the dispersion has a large value, it is evident that the observations are spread far away from the mean, thereby certifying the reliability of the mean as questionable. The different measures of dispersion enlighten us about the variability from multiple angles, which might be useful in controlling the variation.
Types of Measures of Dispersion
There are two types of measures of dispersion:
- Absolute measures of dispersion: This dispersion method helps determine the variation in a given data set. They have the same unit as the original data set.
The different ways of calculating the absolute measures of dispersion are:
- Standard Deviation
- Mean Deviation
- Quartile and Quartile Deviation
2. Relative measures of dispersion: The relative measures of dispersion are used to identify the relation between the dispersion of different data sets. They are either expressed as ratios or as percentages of the central tendency. Relative measures of dispersion are also known as coefficients of dispersion.
The following are the different relative measures of dispersion:
- Coefficient of Range
- Coefficient of Variation
- Coefficient of Mean Deviation
- Coefficient of Quartile Deviation
Let’s dive in to explore these measures!
Absolute Measures of Dispersion
Range as a Measure of Dispersion
The range is the difference between every data set’s minimum and maximum values. This is the easiest method of calculating dispersion and is also easy to understand, offering an idea about the extremities in a data set.
However, since the range is based on two observations in a data set, fluctuations might possibly lead to changes in results.
Example: Let us take the example of this data set: 1, 4, 7, 10, 13. In this case, the range will be the difference between the maximum and minimum value, which is 13 and 1, respectively. Therefore,
Range = Maximum value – Minimum value
Mean Deviation as a Measure of Dispersion
Mean deviation, also known as mean absolute deviation, is the average of the deviations of the observation from a measure of central tendency, be it mean, median, or mode.
Example: Let us understand the steps to find the mean and mean deviation with the help of this data set: 9, 13, 17, 19, 22.
Step 1: The first step is to find the central tendency. It can either be mean, median, or mode. Let us take the mean in this case, which is 16.
Step 2: Next, subtract the mean value from each observation, (9-16), (13-16), (17-16), (19-16), and (22-16), which will give the results -7, -3, 1, 3, 6. However, we will only have to consider the absolute values in the next step, i.e., 7, 3, 1, 3, 6.
Step 3: Next, find the sum of all the absolute values. Here, all the absolute values, when added, will give 20.
Step 4: In the final step, divide this value by the number of observations. In this case, we will divide 20 by 5, giving us 4. Therefore, 4 is the mean deviation from the mean.
Variance as a Measure of Dispersion
Variance is the statistical measure of the differences between the data points and the mean. It determines how far the observations in a data set are spread out with respect to the average value.
To find the variance, you have to subtract the mean value from the observations and square the results. Then, add these values and divide the sum by the frequency, which refers to the number of observations in the data set.
Variance is of two types- population variance and sample variance. While population variance is calculated from the whole data, sample variance is calculated from a data subset. The denominator for population variance is n, and sample variance is n-1, where n refers to the number of observations.
Example: Let us continue with the data set we had taken in the previous example, i.e., 9, 13, 17, 19, 22.
To begin with, find the mean value and then subtract the value from each of the observations. Next, find the square of these values and add all the values thus obtained.
In this case, the values obtained after subtracting the mean from the observations are -7, -3, 1, 3, and 6. The squares of these values are 49, 9, 1, 9, 36, and their sum is 104.
Next, divide the sum by n (in case of population variance) or n-1 (in case of sample variance. Here, the variance is 104/5, i.e., 20.8.
Standard Deviation as a Measure of Dispersion
The square root of the variance gives the standard deviation of a data set. For the above-mentioned data set, the variance is 20.8, and the standard deviation is the square root of this value, i.e., 4.56.
Quartile and Quartile Deviation as a Measure of Dispersion
In a data set, quartiles refer to the numbers which divide all the numbers into quarters. Quartile deviation is a measure of dispersion that helps to determine the deviation of data from the mean value. It is calculated by finding out half of the difference between the first and the third quartiles.
The steps to find the quartile deviation are mentioned below:
Step 1: First, arrange your data in ascending order.
Step 2: Identify the first and the third quartiles. The first quartile is calculated using the formula (n+1)/4, and the third quartile is calculated using formula 3(n+1)/4, where n is the total number of observations.
Step 3: Find the quartile deviation by calculating the difference between the first and the third quartile values and dividing the value by 2.
Example: Let the given data set be 9, 13, 7, 3, 16, 15.
To begin with, arrange the data set in ascending order, which is 3, 7, 9, 13, 15, 16. By applying the formula, we get (6+1)/4=1.75, meaning we have to find out the average of the 1st and the 2nd values. Therefore Q1=(13+9)/2=11.
Similarly, to calculate the Q3, apply the formula 3(6+1)/4=5.25. This suggests that Q3 is the average of the 5th and 6th values, which are 15 and 16. Therefore, Q3, in this case, is 15.5.
Finally, subtracting the two gives us the quartile deviation of 15.5-11=4.5.
Our Top Data Science Programs & Articles
Relative Measures of Dispersion
Coefficient of Range as a Measure of Dispersion
The coefficient of range helps to find out the relative measure of range. It can be calculated using the formula (Largest value-Smallest value) / (Largest value+Smallest value).
Example: Let the data set be 45, 34, 21, 54, 19. Here, 19 is the lowest value, and 54 is the largest value. Therefore,
Coefficient of range= (54-19) / (54+19)=0.479
Coefficient of Variation as a Measure of Dispersion
The coefficient of variation can be defined as the ratio between standard deviation and mean. One can multiply the value thus obtained by 100 to obtain the percentage.
Example: Let us again take the data set 9, 13, 17, 19, 22. To begin with, find the mean, which is 16.
The standard deviation for this data is 4.56. Therefore, the coefficient of variation, in this case, is 16/4.56, which is 3.508.
Coefficient of Quartile Deviation as a Measure of Dispersion
The coefficient of quartile deviation is the ratio of the difference between the third and first quartiles to their sum.
Example: Let the given data set be 9, 13, 7, 3, 16, 15.
Here, the first quartile (Q1) is 11, and the third quartile (Q3) is 15.5. Therefore, the coefficient of quartile deviation is (Q3 – Q1)/(Q3 + Q1), which is 4.5/26.5, which gives the result 0.17.
Coefficient of Mean Deviation as a Measure of Dispersion
The coefficient of mean deviation can be calculated by finding out the ratio between the mean deviation and the arithmetic mean of the observations.
Example: Let us again take the data set- 9, 13, 17, 19, 22. The arithmetic mean of the observations is 16, and the mean deviation is 4. Therefore, the coefficient of mean deviation is 4/16, which gives the result of 0.25.
Top Data Science Skills to Learn
|Top Data Science Skills to Learn|
|1||Data Analysis Course||Inferential Statistics Courses|
|2||Hypothesis Testing Programs||Logistic Regression Courses|
|3||Linear Regression Courses||Linear Algebra for Analysis|
Applications of Measures of Dispersion
Listed below are some of the reasons why calculating dispersion is important:
- Measures of dispersion in statistics help to understand the uniformity or consistency of the distribution. The smaller the value of the dispersion, the data set is more likely to be uniform.
- Dispersion assists in evaluating whether the central tendency accurately represents the whole data set. If the dispersion has a higher value, it indicates that the observations are far away from one another, and the mean value cannot be relied upon to represent the whole data set.
- Measures of dispersion in statistics serve as a foundation for conducting further analysis, such as finding out the regression, correlation, hypothesis test, etc.
Alongside your knowledge of statistics, you can pursue a course in data science, which will teach you how to make the best data-driven decisions through the amalgamation of statistics and technology.
If you are a beginner willing to establish a career as a data scientist, you can explore programs like upGrad’s Python Programming Bootcamp, which will enlighten you about database programming using Python and technologies like Pandas and NumPy.
On the other hand, experienced professionals can go for upGrad’s Executive Doctor of Business Administration in Data Science, which offers an excellent opportunity to pursue a doctorate while fulfilling professional commitments.
What are coefficients of dispersion?
Coefficients of dispersion are the relative measures that help to compare two datasets bearing different units. They do not have units and are expressed as ratios or percentages.
Which is the best measure of dispersion?
Standard deviation is considered the best measure of dispersion. Unlike mean absolute deviation, where all values are considered non-negative, the standard deviation does not deal with assumed values. Therefore, the standard deviation is the most likely to deliver accurate results.
Why is it important to measure dispersion?
Dispersion is measured as it helps determine the central tendencies' reliability. Dispersion also helps to compare the variability of two sets of data.