Statistical Learning: Maximum Likelihood and Maximum A Posteriori Examples

 

1. The Maximum Likelihood (ML) Approach

Here the parameter to be estimated is considered to be a random variable.

Example: A hardware defect started appearing in the assembly line of a computer manufacturer. In the past week, the following observations of the defect were made: Monday (2), Tuesday (2), Wednesday(3), Thursday(1), Friday(0). The assembly line manager knows that the distribution of the defects follows the Poisson distribution, which is a discrete distribution used often to model the number of random events that occur in an interval of time. The probability that K events occur in a given interval is given by:

P(K)=e^{-lambda} frac{lambda^K}{K!}

lambda is the parameter of the distribution that is also equal to the mean and variance of the distribution.  The manager will use maximum likelihood to estimate lambda.

The likelihood function to maximize is proportional to:

(P(k_{1}=2,k_{2}=2,k_{3}=3,k_{4}=1,k_{5}=0)= e^{-5lambda} lambda^{sum_{n=1}^{5}k_{n}}

In order to maximize this function, we take the derivative and set it to 0, which eventually yields:

lambda=frac{sum_{n=1}^{5}k_{n}}{5}=8/5

Statistical learning review question: What is the relationship between ML and Least Squares Error Estimation? When observations are independent, and normally distributed with constant variance, then ML and LSE produce the same parameter values [1].

2.  The Maximum A Posteriori (MAP) Approach

Here the parameter to be estimated is considered fixed, but unknown.

Example : A technical institute in Greece accepts students from two pools each year: (a) Most of the accepted students come from students that apply for the first time (b) A smaller number of the accepted students come from students that apply for a second time. All students are required to take an entrance examination (the max. grade is 100). A second-try student who scored 70 wants to estimate the probability he will be accepted.

The following information is available from previous years: (a) the grade distribution of  accepted second-try-students is a Gaussian with mean 80 and st.dev 10. (b)The grade distribution of not accepted second-try-students is a Gaussian with mean 40 and st. dev 20. (c) The probability of a second-try-student being accepted is 1/5.

P(accepted 2nd try IF 70)= frac{P(70 IF accepted 2nd try) P(accepted 2nd try)}{P(70 IFNOTaccepted 2nd try)P(NOTaccepted 2nd try)+P(70 IF accepted2ndtry)P(accepted 2nd try)}

Doing computations in R, we get:

P(70IFaccepted2ndtry)<-dnorm(70,mean=80,s=10) which gives 0.024.

P(70ifNOTaccepted)<-dnorm(70,mean=40,s=10), which gives 0.0064

So, P(accepted 2nd try IF70)=0.024 X 1/5 /(0.024 X 1/5 + 4/5 0.0064)=0.0048/(0.0048+0.00512)=0.0048/0.00992=0.483

Similarly,

P(NOTaccepted 2nd try IF 70)= frac{P(70 IF NOTaccepted 2nd try) P(NOTaccepted 2nd try)}{P(70 IFNOTaccepted 2nd try)P(NOTaccepted 2nd try)+P(70 IF accepted2ndtry)P(accepted 2nd try)}

So, P(NOTaccepted 2nd try IF70)=0.0064 X 4/5 /(0.024 X 1/5 + 4/5 0.0064)=0.0064/(0.0048+0.00512)=0.556.

Unfortunately, the probability of not being accepted is greater than that of being accepted, so the student is not too happy about this.

A Bibliographical note: In reference [2], ML and MAP were compared for the estimation of physiological parameters. It was concluded that MAP gave a more precise estimate than ML. In general, the authors conclude in case where one has few or noise-impaired measurements, the a priori information can be very useful when incorporated in a MAP-type estimation.

[1] Myung, J., Tutorial on Maximum Likelihood Estimation, Journal of Mathematical Psychology,

[2] Sparacino, G. et al., “Maximum-Likelihood Versus Maximum a Posteriori Parameter Estimation of Physiological System Models: The C-peptide Impulse Response Case Study“, IEEE Transactions on Biomedical Engineering, vol. 47, no. 6, pp. 801-811, June 2000.