So… for max value distributions, the CDF of seeing the max, \(M_n = max_i \{x_1,\cdots,x_n\}\), is
\[P(M_n \leq x) = \prod_i^n F(x),\]
where \(F(x)\) is the CDF of the distribution from which the max was chosen.
Therefore to get the PDF, just differentiate with respect to \(x\) to get,
\[p(M_n) = (n-1)F(x)^nf(x),\]
where \(f(x)\) is the PDF of the sample distribution.
Moments with Normal distrubtions
There’s really no expression for moments of \(p(M_n)\) when \(n \gt\) a few… however, you can use some approximations or bounds. Bounds aren’t that tight… so we’ll use approximations
First Moment
For large n, there’s an approximate formula from:
Algorithm AS 177: Expected Normal Order Statistics (Exact and Approximate) Author(s): J. P. Royston Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 31, No. 2 (1982), pp. 161-165 Stable URL: http://www.jstor.org/stable/2347982
which is
\[\mathbb{E} [M_n] = \mu + \Phi^{-1}\left(\frac{n-\alpha}{n-2\alpha+1}\right)\sqrt{\sigma^2},\]
where \(\Phi^{-1}\) is the inverse of the normal CDF, and \(\alpha\) is a constant that is \(\alpha \sim 0.375\). You can numerically optimize \(\alpha\) for the values of \(n\) that you are interested in
\(\sqrt{n}\) | \(n\) | alpha |
---|---|---|
3 | 9 | 0.36205 |
5 | 25 | 0.37662 |
7 | 49 | 0.38432 |
9 | 81 | 0.38897 |
11 | 121 | 0.39264 |
Second Moment
Note that this is going to be the same scaling as the variance, just shifted by \(\mu^2\).
I couldn’t find anything in the literature that worked well, so I guessed a reasonable form that linearized the dependence of \(\mathbb{E}[M_n^2]\) of Monte Carlo samples upon \(n\), \(\sigma^2\), and \(\mu\). I got the following formula, which is NOT perfectly linear, but seems pretty reasonable:
\[\text{Var} = \mathbb{E}[M_n^2] - \mathbb{E}[M_n]^2 \sim \sigma^2\cdot \frac{a+b/n}{\text{ln}(n)},\]
where \(a\), and \(b\) are undetermined constants.
Now all you have to do is optimize the values of \(a\) and \(b\), as was done above for \(\alpha\). Note, this isn’t going to be perfect, because the form above isn’t perfect, therefore the values of \(a\) and \(b\) you get will depend up on the \(\mu\), \(\sigma^2\), and \(n\) used to draw the Monte Carlo samples. However, the dependence is pretty minimal. Anyway, I found that the following values work pretty well.
\(a\) | \(b\) |
---|---|
0.85317 | -0.573889 |
Estimating the underlying Normal distribution of the samples
With the equations above for the approximations to the first and second moments, you can get a pretty decent estimate, by solving for \(\sigma^2\), and then \(\mu\). Here’s some Python
code
import numpy as np
from scipy import special
def _estimate_mu(n,var):
= 0.375
alpha return special.ndtri((n-alpha)/(n-2.*alpha+1.))*np.sqrt(var)
def _estimate_var(n):
= .85317
a = -.573889
b = np.log(n)/(a+b/n)
y return y
def estimate_from_max(d,n):
= np.var(d) * _estimate_var(n)
v = np.mean(d) - _estimate_mu(n,v)
m return m,v