Unbiasedness of EML fit for a mixture model with fixed component pdfs

Published

April 25, 2022

The log-likelihood for one observed Poisson-distributed count is (without constants) \[ \ell_i(\lambda) := \ln \mathcal{L}_i(\lambda) = -\lambda + k_i \ln \lambda. \]

The ML estimate is obtained by solving the score function \(\sum_i \partial \ell_i / \partial \lambda \overset!= 0\). The solution \(\hat\lambda = \frac1N \sum_i k_i\) is unbiased, \[ E[\hat \lambda] = \frac1N \sum_i E[k_i] = \lambda. \]

This can be generalised: if the estimator is a linear function of the observations, the estimator is unbiased.

We now consider a two component model, where the number density is \(\rho(x) = S f_s(x) + B f_b(x)\).

For one bin \(i\), the expectation is \(\lambda_i = S s_i + B b_i\) with constants \(s_i = \int_{a_i}^{b_i} f_s(x) dx\) and \(b_i = \int_{a_i}^{b_i} f_b(x) dx\).

The score functions are \[ \ell_i(S, B) = -(S s_i + B b_i)/N + k_i \ln(S s_i + B b_i) \] \[ \frac{\partial \sum_i \ell_i}{\partial S} = -1 + \sum_i k_i \frac{s_i}{S s_i + B b_i} \overset!= 0 \quad\text{and}\quad \frac{\partial \sum_i \ell_i}{\partial B} = -1 + \sum_i k_i \frac{b_i}{S s_i + B b_i} \overset!= 0. \]

We calculate the solution explicitly for two bins. The calculation is fairly involved, although the solution is simple.

from sympy import *

S, B, s1, s2, b1, b2, k1, k2 = symbols("S B s_1 s_2 b_1 b_2 k_1 k_2", real=True, nonnegative=True)
eq1 = Eq(k1 * s1 / (S * s1 + B * b1) + k2 * s2 / (S * s2 + B * b2), 1)
eq1

\(\displaystyle \frac{k_{1} s_{1}}{B b_{1} + S s_{1}} + \frac{k_{2} s_{2}}{B b_{2} + S s_{2}} = 1\)

eq2 = Eq(k1 * b1 / (S * s1 + B * b1) + k2 * b2 / (S * s2 + B * b2), 1)
eq2

\(\displaystyle \frac{b_{1} k_{1}}{B b_{1} + S s_{1}} + \frac{b_{2} k_{2}}{B b_{2} + S s_{2}} = 1\)

sols = solve((eq1, eq2), (S, B))
sols
[((b_1*b_2*k_1 + b_1*b_2*k_2 - b_1*k_2*s_2 - b_2*k_1*s_1)/((b_1 - s_1)*(b_2 - s_2)),
  (-b_1*k_1*s_2 - b_2*k_2*s_1 + k_1*s_1*s_2 + k_2*s_1*s_2)/((b_1 - s_1)*(b_2 - s_2)))]
for sol in sols:
    for k, s in zip((S, B), sol):
        print(f"Estimate for {k}:")
        display(s)
Estimate for S:

\(\displaystyle \frac{b_{1} b_{2} k_{1} + b_{1} b_{2} k_{2} - b_{1} k_{2} s_{2} - b_{2} k_{1} s_{1}}{\left(b_{1} - s_{1}\right) \left(b_{2} - s_{2}\right)}\)

Estimate for B:

\(\displaystyle \frac{- b_{1} k_{1} s_{2} - b_{2} k_{2} s_{1} + k_{1} s_{1} s_{2} + k_{2} s_{1} s_{2}}{\left(b_{1} - s_{1}\right) \left(b_{2} - s_{2}\right)}\)

These estimates are linear functions of the counts \(k_1\) and \(k_2\) and therefore unbiased.

What is left to show: that we also get a linear (and thus unbiased) solution for more than two bins.

Unbiasedness for an unbinned EML fit follows, since the unbinned EML fit is a limiting case of the binned EML fit as the bin width goes to zero.