抱歉,您的浏览器无法访问本站

本页面需要浏览器支持(启用)JavaScript


了解详情 >

welford算法求方差

cc https://zhuanlan.zhihu.com/p/408474710

基础方法

  • $D(x) = \frac{\sum(x_i - mean)^2}{n}$
    • 需要两次遍历1. 第一次遍历获取mean, 2. 第二次遍历计算方差

数学变换优化

  • 数学等价转换的一次遍历
    • 可以推导出$D(x) = E(x^2) - E(x)^2$, 只需要用x和x方做一次遍历即可
    • 但是先累加容易出现精度问题

$$
D(x) = \frac{\sum(x_i - u)^2}{n} \
= \frac{\sum(x_i^2 - 2ux + u^2)}{n} \
= \frac{\sum(x_i^2) - \sum(2ux) + nu^2}{n} \
= E(x^2) - 2uE(x) + u^2 \
= E(x^2) - E(x) \
$$

welford

数学变化, 转换成迭代的表达式, e.g. x_{n+1} = …x_{n}

结论:

$$
E_{n+1}(x) = E_{n}(x) + \frac{x_{n+1} - E_n(x)}{n+1}
$$

$$
D_{n+1}(x) = D_{n}(x) + \frac{(x_{n+1} - E_{n}(x))(x_{n+1}-E_{n+1}(x)) - D_{n}(x)}{n + 1}
$$

其中$E_n(x)$和$D_n(x)$分别表示对前n个x求的均值和方差

welford误差优化

额外记录一个M2值, M2更新公式:

$$
M2_{n} = M2_{n-1} + (x_n - E_{n-1}(x))(x_n - E_n(x))
$$

最后用M2来计算方差

$$
D(x) = \frac{M2_n}{N}
$$

impl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import numpy as np

def welford_update(count, mean, M2, currValue):
count += 1
delta = currValue - mean
mean += delta / count
delta2 = currValue - mean
M2 += delta * delta2
return (count, mean, M2)


def naive_update(sum, sum_square, currValue):
sum = sum + currValue
sum_square = sum_square + currValue * currValue
return (sum, sum_square)


x_arr = np.random.randn(100000).astype(np.float32)

welford_mean = 0
welford_m2 = 0
welford_count = 0
for i in range(len(x_arr)):
new_val = x_arr[i]
welford_count, welford_mean, welford_m2 = welford_update(welford_count, welford_mean, welford_m2, new_val)
print("Welford mean: ", welford_mean)
print("Welford var: ", welford_m2 / welford_count)

naive_sum = 0
naive_sum_square = 0
for i in range(len(x_arr)):
new_val = x_arr[i]
naive_sum, naive_sum_square = naive_update(naive_sum, naive_sum_square, new_val)
naive_mean = naive_sum / len(x_arr)
naive_var = naive_sum_square/ len(x_arr) - naive_mean*naive_mean
print("Naive mean: ", naive_mean)
print("Naive var: ", naive_var)

评论