Standard Deviation

Standard deviation (σ in Greek) is a technique in data science and descriptive analysis used to determine how close or far the data is in relation to the mean. If the data is close in relation to the mean, the standard deviation will be lower. If the data is more dispersed away from the mean, this will lead to a higher standard deviation. It is also used to make comparisons between two data sets, like comparing the temperatures between two cities for a week (in the same temperature scale).

standard deviation chart

The formula for calculating the standard deviation is: σ = √Σ|x - x̄|2/n

There are several steps to calculating the standard deviation:

  1. Find the mean() - Formula: x̄ = Σx/N
    1. count up the terms (N)
    2. sum them together (Σx)
    3. divide the sum by the number of terms within the set to find the mean.
  2. Calculate the sum of squares - Formula: Sum of Squares = Σ(xi + x̄)2
    1. Subtract the value at i (the value at that index) minus the mean.
    2. Exponentiate this number to the power of 2.
    3. Add it to sum.
    4. Repeat steps a to c for each item in the data set.
  3. Subtract the number of items in the data set by 1, which we'll call n.
  4. Divide the sum of squares by n, which gives you the variance.
  5. Find the standard deviation by calculating the square root of the variance.
Source: Khan Academy: Calculating standard deviation step by step