Estimating Max-value Distributions

A decent approximation for order statistics
Author

Colin Kinz-Thompson

Published

June 21, 2017

PDF

So… for max value distributions, the CDF of seeing the max, \(M_n = max_i \{x_1,\cdots,x_n\}\), is

\[P(M_n \leq x) = \prod_i^n F(x),\]

where \(F(x)\) is the CDF of the distribution from which the max was chosen.

Therefore to get the PDF, just differentiate with respect to \(x\) to get,

\[p(M_n) = (n-1)F(x)^nf(x),\]

where \(f(x)\) is the PDF of the sample distribution.

Moments with Normal distrubtions

There’s really no expression for moments of \(p(M_n)\) when \(n \gt\) a few… however, you can use some approximations or bounds. Bounds aren’t that tight… so we’ll use approximations

First Moment

For large n, there’s an approximate formula from:

Algorithm AS 177: Expected Normal Order Statistics (Exact and Approximate) Author(s): J. P. Royston Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 31, No. 2 (1982), pp. 161-165 Stable URL: http://www.jstor.org/stable/2347982

which is

\[\mathbb{E} [M_n] = \mu + \Phi^{-1}\left(\frac{n-\alpha}{n-2\alpha+1}\right)\sqrt{\sigma^2},\]

where \(\Phi^{-1}\) is the inverse of the normal CDF, and \(\alpha\) is a constant that is \(\alpha \sim 0.375\). You can numerically optimize \(\alpha\) for the values of \(n\) that you are interested in

\(\sqrt{n}\) \(n\) alpha
3 9 0.36205
5 25 0.37662
7 49 0.38432
9 81 0.38897
11 121 0.39264

Second Moment

Note that this is going to be the same scaling as the variance, just shifted by \(\mu^2\).

I couldn’t find anything in the literature that worked well, so I guessed a reasonable form that linearized the dependence of \(\mathbb{E}[M_n^2]\) of Monte Carlo samples upon \(n\), \(\sigma^2\), and \(\mu\). I got the following formula, which is NOT perfectly linear, but seems pretty reasonable:

\[\text{Var} = \mathbb{E}[M_n^2] - \mathbb{E}[M_n]^2 \sim \sigma^2\cdot \frac{a+b/n}{\text{ln}(n)},\]

where \(a\), and \(b\) are undetermined constants.

Now all you have to do is optimize the values of \(a\) and \(b\), as was done above for \(\alpha\). Note, this isn’t going to be perfect, because the form above isn’t perfect, therefore the values of \(a\) and \(b\) you get will depend up on the \(\mu\), \(\sigma^2\), and \(n\) used to draw the Monte Carlo samples. However, the dependence is pretty minimal. Anyway, I found that the following values work pretty well.

\(a\) \(b\)
0.85317 -0.573889

Estimating the underlying Normal distribution of the samples

With the equations above for the approximations to the first and second moments, you can get a pretty decent estimate, by solving for \(\sigma^2\), and then \(\mu\). Here’s some Python code

import numpy as np
from scipy import special

def _estimate_mu(n,var):
    alpha = 0.375
    return special.ndtri((n-alpha)/(n-2.*alpha+1.))*np.sqrt(var)

def _estimate_var(n):
    a = .85317
    b = -.573889
    y = np.log(n)/(a+b/n)
    return y

def estimate_from_max(d,n):
    v = np.var(d)  * _estimate_var(n)
    m = np.mean(d) - _estimate_mu(n,v)
    return m,v