# Normal Distribution and CLT in Data Science

• By Mahesh Kankrale
• July 31, 2024
• Data Science

# Normal Distribution and CLT in Data Science

A normal distribution is a continuous probability distribution with a probability density function that gives you a symmetrical bell curve. Simply put, it ispoint and a few points taper off symmetrically towards two opposite ends. Explore Normal Distribution and CLT in Data Science: Key concepts, their importance in statistical analysis, and applications in real-world data scenarios.

## Empirical Rule :

The empirical rule, also known as the 68-95-99.7 rule or the three-sigma rule, is a statistical rule of thumb that describes the approximate percentage of values that fall within a certain number of standard deviations from the mean in a normal distribution:

• Approximately 68% of the data falls within one standard deviation of the mean.
• Approximately 95% of the data falls within two standard deviations of the mean.
• Approximately 99.7% of the data falls within three standard deviations of the mean.

This rule is based on properties of the normal distribution and provides a quick way to estimate the spread of data and identify outliers.

## Central Limit Theorem :

The central limit theorem (CLT) is a fundamental concept in statistics. It tells us about the distribution of averages (means) from samples drawn from a population. Here’s the gist of it:

• Large samples: The CLT applies when you take a large enough random sample from a population, regardless of the original shape of the population’s distribution (normal, skewed, etc.).
• Sample means become normal: The distribution of the means of those samples will tend towards a normal distribution (a bell-shaped curve) as the sample size increases.
• Mean and standard deviation: The average of the sample means will be equal to the population mean, and the standard deviation of the sample means will be related to the population’s standard deviation.

Whatever the form of the population distribution, the sampling distribution tends to a Gaussian, and its dispersion is given by the central limit theorem

## Log Normal Distribution :

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution. Equivalently, if Y has a normal distribution, then the exponential function of Y, X = exp(Y) , has a log-normal distribution. A random variable that is log-normally distributed takes only positive real values.