Chance of seeing a one-sigma deviation in random splits

Published

December 5, 2021

We randomly split a sample into two parts and check whether their respective arithmetic means deviate by more than one standard deviation from each other. The chance for this two happen appears to be independent of the distribution from which the original samples are drawn.

import numpy as np
rng = np.random.default_rng(0)

for distribution in (rng.normal, rng.uniform, rng.exponential):
    x = rng.uniform(size=20)
    
    deltas = []
    ntry = 10000
    for itry in range(ntry):
        rng.shuffle(x)
        a = x[:int(len(x) / 2) ]
        b = x[int(len(x) / 2):]
        ma = np.mean(a)
        mb = np.mean(b)
        va = np.var(a) / len(a)
        vb = np.var(b) / len(b)
        v = va + vb
        deltas.append((ma - mb) / v ** 0.5)
    deltas = np.array(deltas)
    
    print(distribution.__name__, "fraction with < 1 sigma deviation", np.sum(deltas < 1) / len(deltas))
normal fraction with < 1 sigma deviation 0.8255
uniform fraction with < 1 sigma deviation 0.8238
exponential fraction with < 1 sigma deviation 0.8281