Sunday, 30 July 2017

Relation between median and mean using R


We are going to work mainly with R. R is a free software environment for statistical computing and graphics.

When the data have a symmetrical distribution without outliers, the mean and the sample median are very close. 
consume<-c(6.9, 6.3, 6.2, 6.5 ,6.4, 6.8, 6.6)
data.frame(mean=mean(consume),median=median(consume))

> data.frame(mean=mean(consume),median=median(consume))
      mean median
1 6.528571    6.5
However, when the distributions are asymmetrical the measure and the median will not be
Coincident:

  • Right asymmetry: the mean is greater than the median
  • Left asymmetry: the mean is less than the median
 This is another example when we probe this. We are going to use this datas in R:
 salaries=c(903, 2684, 550, 1571, 1190, 857, 547, 2401, 1257, 411, 3500, 284, 7537, 1666, 604, 692, 450, 770, 3013, 566)
 This is a list that show monthly salary of 20 workers of one company. We can calculate median and mean using this commands in R.

mean(salaries) --> 1572.65
median(salaries) --> 880
hist(salaries)
abline(v=c(mean(salarios),median(salarios)), col=c("blue","red"))
We get the results that you see in Fig. 1. 

Figure 1. Histogram of salaries
Is important remark that if we use mean salary can be misleading because 70% has a salary lower than the average salary.
mean(salaries<mean(salaries))
[1] 0.7

Also we can use a dot chart which is showed in Fig. 2.
Figure 2. Dotchart


 The code is:
dotchart(salaries,pch=16,xlab="diameter")
abline(v=mean(salaries),col='red',lwd=2)
abline(v=median(salaries),col='blue',lty=2,lwd=2)
legend("bottomright",c("mean","median"),
       col=c("red","blue"),lty=c(1,2),lwd=c(2,2),box.lty=0,cex=1.5)

No comments:

Post a Comment