Chapter 8 Joint Distributions

So far, we learned about joint probabilities in Bayesian context such as $$P(A|B) = P(A,B)/P(B)$$. Now, we are going to expand this concept into discrete and continuous distributions. Define $$P(X = x, Y=y) = f(x,y)$$ as the probability mass function (discrete) or probability density function (continuous).

Same probability laws apply to joint distributions as well.

• $$f(x,y) \ge 0$$ for all $$(x,y)$$.
• $$\sum_x \sum_y f(x,y) = 1$$ or $$\int_x \int_y f(x,y) dx dy = 1$$

Example (Discrete): Suppose there are 10 balls in a box; 3 white, 4 black and 3 red. Two balls are randomly selected. Let’s say random variable X is the number of white balls picked and r.v. Y is the number of black balls picked. (a) Find the joint probability function and (b) find the probabilities.

1. Let’s first enumerate the alternatives. $$(x,y)$$ pair can be either of $$(0,0),(0,1),(0,2),(1,1),(2,0),(1,0)$$. Total number of alternatives are $$\binom{10}{2}$$. To calculate, number of ways of getting 1 white and 1 black ball is $$\binom{3}{1}\binom{4}{1}\binom{3}{0}$$. So, the probability will be $$\dfrac{\binom{3}{1}\binom{4}{1}\binom{3}{0}}{\binom{10}{2}}$$. We can generalize it to a function.

$f(x,y) = \dfrac{\binom{3}{x}\binom{4}{y}\binom{3}{2-x-y}}{\binom{10}{2}}$

Let’s also make it into an R function

f_xy_ballpick <- function(x,y,picked=2,n_balls=10,n_x=3,n_y=4){
#picked is the number of balls picked
#n_balls is the total number of balls
#n_x is the number of balls belonging to rv X (white)
#n_y is the number of balls belonging to rv Y (black)
#x and y are values to our random variables their total cannot exceed picked

#If the sum of x and y is greater than picked, then its probability is zero.
if(x+y > picked){
return(0)
}

#Remember choose is the R function of binomial coefficient (or combination)
(choose(n_x,x)*choose(n_y,y)*choose(n_balls - n_x - n_y,picked-x-y))/choose(n_balls,picked)

}

f_xy_ballpick(x=1,y=1)
##  0.2666667
1. Using the above formula we can calculate all the probabilities within the specified region $$x+y \le 2$$.
#First create an empty probability matrix.
#Let's say that columns are x = 0,1,2 and rows are y = 0, 1, 2
prob_matrix<-matrix(0,ncol=3,nrow=3)

#Indices in R start from 1 so 1,1 is actually x=0,y=0
prob_matrix[1,1]<-f_xy_ballpick(x=0,y=0)
prob_matrix[1,2]<-f_xy_ballpick(x=1,y=0)
prob_matrix[1,3]<-f_xy_ballpick(x=2,y=0)
prob_matrix[2,1]<-f_xy_ballpick(x=0,y=1)
prob_matrix[2,2]<-f_xy_ballpick(x=1,y=1)
prob_matrix[2,3]<-f_xy_ballpick(x=2,y=1)
prob_matrix[3,1]<-f_xy_ballpick(x=0,y=2)
prob_matrix[3,2]<-f_xy_ballpick(x=1,y=2)
prob_matrix[3,3]<-f_xy_ballpick(x=2,y=2)

#Let's also define the colnames and rownames of the matrix.
#paste0 is an R command which just appends statements
colnames(prob_matrix) <- paste0("x_",0:2)
rownames(prob_matrix) <- paste0("y_",0:2)

round(prob_matrix,2)
##      x_0  x_1  x_2
## y_0 0.07 0.20 0.07
## y_1 0.27 0.27 0.00
## y_2 0.13 0.00 0.00

Example (continuous): (This is from the textbook, Example 3.15) A privately owned business operates both a drive-in facility and a walk-in facility. On a randomly selected day, let X and Y, respectively, be the proportions of time that the drive-in and the walk-in facilities are in use and suppose that the joint density function of these random variables is

$f(x,y) = \dfrac{2}{5}(2x + 3y), 0 \le x \le 1, 0 \le y \le 1$

and 0 for other values of x and y.

1. Verify $$\int_x \int_y f(x,y) dx dy = 1$$
2. Find $$P[(X,Y) \in A]$$, where $$A = \{(x,y)|0 < x < 1/2, 1/4 < y < 1/2\}$$

3. (see the book for the full calculations)

$\int_x \int_y f(x,y) dx dy = \int_0^1 \int_0^1 \dfrac{2}{5}(2x+3y) dx dy = 1$

1. (see the book for the full calculations)

$\int_x \int_y f(x,y) dx dy = \int_{1/4}^{1/2} \int_0^{1/2} \dfrac{2}{5}(2x+3y) dx dy = 13/160$

Example (with special distributions)

1. Patients arrive at the doctor’s office according to Poisson distribution with $$\lambda = 2$$/hour.

1. What is the probability of getting less than or equal to 2 patients within 2 hours?
2. Suppose each arriving patient has 50% chance to bring a person to accompany. There are 10 seats in the waiting room. At least many hours should pass that there is at least 50% probability that the waiting room is filled with patients and their relatives?

Solution

1. $$P(X\le 2|\lambda t = 2)= \sum_{i=0}^2 \dfrac{e^{-\lambda t}(\lambda t)^i}{i!}$$
    #cdf of poisson
ppois(2,lambda=2*2)
##  0.2381033
1. First let's define the problem. Define $$n_p$$ as the number of patients and $$n_c$$ is the number of company. We want $$n_p + n_c \ge 10$$ with probability 50% or higher for a given $$t^*$$. Or to paraphrase, we want $$n_p + n_c \le 9$$ w.p. 50% or lower.

What is $$n_c$$ affected by? $$n_p$$. It is actually a binomial distribution problem. $$P(n_c = i|n_p) = \binom{n_p}{i} (0.5)^i*(0.5)^{n_p-i}$$. It is even better if we use cdf $$P(n_c \le k|n_p) = \sum_{i=0}^{k} \binom{n_p}{i} (0.5)^i*(0.5)^{n_p-i}$$.

We know the arrival of the patients is distributed with poisson. So, $$P(n_p = j|\lambda t^*) = \dfrac{e^{-\lambda t}(\lambda t)^j}{j!}$$. So $$P(j + k \le N) = \sum_{a=0}^j P(n_p = a|\lambda t^*)*P(n_c \le N-a | n_p = a)$$. Remember it is always $$n_c \le n_p$$.

    #Let's define a function
calculate_probability<-function(N=9,t_star=1,lambda=2){
#N is the max desired number of patients
the_prob<-0
for(n_p in 0:N){
the_prob <- the_prob + dpois(n_p,lambda=lambda*t_star)*pbinom(q=min(N-n_p,n_p),size=n_p,prob=0.5)
}

return(the_prob)

}

#Try different t_stars so probability is below 0.5
calculate_probability(t_star=2)
##  0.8631867
    calculate_probability(t_star=3)
##  0.5810261
    calculate_probability(t_star=3.3)
##  0.4905249

8.0.1 Marginal Distributions

You can get the marginal distributions by just summing up or integrating the other random variable such as $$P(Y=y) = \sum_x f(x,y)$$ or $$f(y) = \int_x f(x,y) dx$$. Let’s calculate the marginal distribution of black balls (rv Y) in the above example.

#Let's recall the prob_matrix
round(prob_matrix,2)
##      x_0  x_1  x_2
## y_0 0.07 0.20 0.07
## y_1 0.27 0.27 0.00
## y_2 0.13 0.00 0.00
#rowSums is an R function that calculates the sum of each row.
#It is equivalent to y_0 = prob_matrix[1,1] + prob_matrix[1,2] + prob_matrix[1,3]
rowSums(prob_matrix)
##       y_0       y_1       y_2
## 0.3333333 0.5333333 0.1333333

Marginal distribution of y in the second example is calculated as follows.

$\int_x \dfrac{2}{5}(2x+3y) dx = \dfrac{2(1+3y)}{5}$

8.0.2 Conditional Distribution

Similar to Bayes’ Rule, it is possible to calculate conditional probabilities of joint distributions. Let’s denote g(x) as the marginal distribution of x and h(y) as the marginal distribution of y. The formula of conditional distribution of x given y is as follows.

$f(x|y) = f(x,y)/h(y)$

Note that conditional distribution function is useless if x and y are independent. ($$f(x|y)=f(x)$$)