In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.
The exponential distribution is a continuous probability distribution used to model the time between events in a Poisson process, where events occur continuously and independently at a constant rate. Its mean and standard deviation are equal. They are 1/lambda where lambda is the rate.
In these simulations we create a simple vector with 40 exponentials and take a look at it. We set lambda=0.2 which also means that the mean of exponential distribution is 1/0.2 = 5 and the standard deviation is also 1/0.2 = 5.
The blue line represents theoretical mean of 5 and the red line shows the sample mean. The sample mean is not too far from the theoretical mean indicating that randomness of the function works well. The sample mean differs from the theoretical mean due to natural variability in the sampling process. Increasing the sample size and ensuring random sampling can help reduce this difference.
exp40 <- rexp(40, 0.2)
hist(exp40, main="Histogram of 40 random exponentials: Mean")
abline(v = mean(exp40), col = "red", lwd = 2)
abline(v=5, col="blue",lwd=3)
legend("topright", legend = c("Sample Mean", "Theoretical Mean"), col = c("red", "blue"), lwd = 2)
The sample variance differs from the theoretical variance due to natural variability in the sampling process. Increasing the sample size and ensuring random sampling can help reduce this difference.
hist(exp40, main="Histogram of 40 random exponentials: Variance",xlim=c(0,30))
abline(v = var(exp40), col = "red", lwd = 2)
abline(v=5^2, col="blue",lwd=3)
legend("topright", legend = c("Sample Variance", "Theoretical Variance"), col = c("red", "blue"), lwd = 2)
Now let’s repeat 1000 simulations. According to the Central Limit Theorem, the distribution of the sample means will be approximately normal if the sample size is large enough, regardless of the population distribution. The mean of this distribution of sample means will equal the theoretical mean. In the figure below you can see the means of 1000 samples (red line) approach the theoretical mean (blue line) of exponential distribution at 5.
exp40_1000 = NULL
for(i in 1:1000) exp40_1000 = c(exp40_1000,mean(rexp(40, 0.2)))
hist(exp40_1000, main="Histogram of 1000 means of 40 random exponentials")
abline(v = mean(exp40_1000), col = "red", lwd = 2)
abline(v=5, col="blue",lwd=3)
legend("topright", legend = c("Sample Mean", "Theoretical Mean"), col = c("red", "blue"), lwd = 2)