Sunday, 30 July 2017

Binomial distribution in R

We have one experiment with two possible results (success and fail) where the probability of success is Π. If we repeat n independent trials in the same conditions the variable...
x = number of success in n trails
follow a binomial distribution of n parameters and Π, and we write X ∈ Bin (n, Π). We can calculate mean and standard deviation too. The function dbinom in R calculate the odds of one variable follow binomial distribution. We can use..

dbinom(x, size, prob, log = FALSE)
pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE)
qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE)
rbinom(n, size, prob)
We show this with an example in R. 
n=8
prob=0.15
x=0:n
p=dbinom(x, size=n, prob=prob)
# p1=round(p,4)
names(p)=x
r=barplot(p,col='grey85',ylim=c(0,0.45),
          main=paste("Bin(n=8,p=",prob,")",sep=""))
text(r,p,round(p,2),pos=3,cex=0.7)
We get the Fig. 1 for one probability of 0.15, Fig. 2 for 0.25, Fig 3. for 0.50, Fig 4. for 0.75.


Figure 1. Binomiral distribution, prob 0.15

Figure 2. Binomiral distribution, prob 0.25

Figure 3. Binomiral distribution, prob 0.50

Figure 4. Binomiral distribution, prob 0.75

Relation between median and mean using R


We are going to work mainly with R. R is a free software environment for statistical computing and graphics.

When the data have a symmetrical distribution without outliers, the mean and the sample median are very close. 
consume<-c(6.9, 6.3, 6.2, 6.5 ,6.4, 6.8, 6.6)
data.frame(mean=mean(consume),median=median(consume))

> data.frame(mean=mean(consume),median=median(consume))
      mean median
1 6.528571    6.5
However, when the distributions are asymmetrical the measure and the median will not be
Coincident:

  • Right asymmetry: the mean is greater than the median
  • Left asymmetry: the mean is less than the median
 This is another example when we probe this. We are going to use this datas in R:
 salaries=c(903, 2684, 550, 1571, 1190, 857, 547, 2401, 1257, 411, 3500, 284, 7537, 1666, 604, 692, 450, 770, 3013, 566)
 This is a list that show monthly salary of 20 workers of one company. We can calculate median and mean using this commands in R.

mean(salaries) --> 1572.65
median(salaries) --> 880
hist(salaries)
abline(v=c(mean(salarios),median(salarios)), col=c("blue","red"))
We get the results that you see in Fig. 1. 

Figure 1. Histogram of salaries
Is important remark that if we use mean salary can be misleading because 70% has a salary lower than the average salary.
mean(salaries<mean(salaries))
[1] 0.7

Also we can use a dot chart which is showed in Fig. 2.
Figure 2. Dotchart


 The code is:
dotchart(salaries,pch=16,xlab="diameter")
abline(v=mean(salaries),col='red',lwd=2)
abline(v=median(salaries),col='blue',lty=2,lwd=2)
legend("bottomright",c("mean","median"),
       col=c("red","blue"),lty=c(1,2),lwd=c(2,2),box.lty=0,cex=1.5)

Hello World!

This is a blog about numbers and how numbers explain the world. Welcome. I see you soon