sách gpt4 ai đã đi

Biến đổi CDF ngược lấy mẫu phân phối hơi sai python inverse cdf

In lại 作者:行者123 更新时间:2023-12-01 02:05:35 31 4
mua khóa gpt4 Nike

lý lịch

我需要使用已知的累积分布函数 (CDF) 从相当复杂的概率密度函数 (PDF) 中随机采样,并且我正在尝试使用 inverse transform sampling 。这应该很容易做到,因为我有 CDF,只需要在插入均匀随机数的同时对其进行数值反转(不可能用代数方式进行)。但是,生成的分布的方差低于预期,并且我在 CDF 中找不到任何错误。

因此,我通过从正态分布中采样来简化并测试了我的算法。结果是一样的:位置没问题,但比例不对。我知道有更好的内置高斯采样方法,但这只是对采样算法的测试。

这个问题最初是在 Fortran 中出现的,但后来我在 Python 中复制了这个问题,所以我必须做一些根本上错误的事情或者遇到数值问题。

Python

nhập numpy dưới dạng np
from scipy.special import erf
from scipy.optimize import brentq
nhập matplotlib.pyplot dưới dạng plt
from scipy.stats import norm


def testfunc(x):
## Test case, result should be 6.04880103
# out = 0.5 * (1. + erf((x - 5.) / (2. * np.sqrt(2.)))) - 0.7
r = np.random.uniform()
# hand-built cdf:
# out = 0.5 * (1. + erf((x - 5.) / (2. * np.sqrt(2.)))) - r
# scipy cdf:
out = norm.cdf(x, 5, 2) - r
return out


nếu __name__ == '__main__':
n = 10000
sol_array = np.zeros(n)
for i in range(0, n):
sol_array[i] = brentq(testfunc, -100.,100.)

print('mean = ' + str(np.mean(sol_array)))
print('std = ' + str(np.std(sol_array)))
plt.hist(sol_array, normed=True, bins='fd')
x = np.linspace(-1, 11, 1000)
plt.plot(x, norm.pdf(x, 5, 2))
plt.hiển thị()

正如预期的那样,采样值的平均值约为 5,但标准差约为 1.28,对于我手工构建的 CDF 和 scipy 的 CDF 来说,它应该是 2。这在直方图中也可见: normal distribution with mean = 5 and standard deviation = 2

Fortran

Fortran 中存在同样的问题,尽管结果标准差的值不同。由于包含了求解器,因此代码较长。该求解器是 Alan Miller 翻译的 Fortran 90 版本旧的 FORTRAN 77 netlib 例程 ( zeroin.f ).

implicit none
integer, parameter :: dp = selected_real_kind(15, 307)
integer, parameter :: n = 1000000
real, dimension(n) :: v
real :: mean, std
integer, dimension(:), allocatable :: seed
integer :: i, seedsize, clock

! seed the PRNG
call random_seed(size=seedsize)
allocate(seed(seedsize))
call system_clock(count=clock)
seed=clock + 37 * (/ (i - 1, i=1, seedsize) /)
call random_seed(put=seed)
deallocate(seed)

do i = 1, n
v(i) = real(zeroin(testfunc, -100._dp, 100._dp, 1e-20_dp, 1e-10_dp))
end do

mean = sum(v) / n
std = sum((v - mean)**2) / n
print*, mean, std

contains

function testfunc(v)
implicit none
real(dp), intent(in) :: v
real(dp) :: testfunc, r

call random_number(r)

! testfunc = 0.5 * (1. + erf((v-5.)/(2.*sqrt(2.)))) - 0.7 ! should be 6.04880
testfunc = 0.5 * (1. + erf((v-5.)/(2.*sqrt(2.)))) - r ! Gaussian test with mu=5 and sigma=2
end function testfunc

function zeroin(f, ax, bx, aerr, rerr) result(fn_val)
! original zeroin.f from netlib.org
! code converted using to_f90 by alan miller
! date: 2003-07-14 time: 12:32:54
!-----------------------------------------------------------------------
! finding a zero of the function f(x) in the interval (ax,bx)
! ------------------------
! INPUT:
! f function subprogram which evaluates f(x) for any x in the
! closed interval (ax,bx). it is assumed that f is continuous,
! and that f(ax) and f(bx) have different signs.
! ax left endpoint of the interval
! bx right endpoint of the interval
! aerr the absolute error tolerance to be satisfied
! rerr the relative error tolerance to be satisfied

! OUTPUT:
! abcissa approximating a zero of f in the interval (ax,bx)
!-----------------------------------------------------------------------
! zeroin is a slightly modified translation of the algol procedure
! zero given by Richard Brent in "Algorithms for Minimization without
! Derivatives", Prentice-Hall, Inc. (1973).
implicit none
real(dp), intent(in) :: ax
real(dp), intent(in) :: bx
real(dp), intent(in) :: aerr
real(dp), intent(in) :: rerr
real(dp) :: fn_val
real(dp) :: a, b, c, d, e, eps, fa, fb, fc, tol, xm, p, q, r, s, atol, rtol

interface
real(selected_real_kind(15, 307)) function f(x)
real(selected_real_kind(15, 307)), intent(in) :: x
end function f
end interface

! compute eps, the relative machine precision
eps = epsilon(0.0_dp)

! initialization
a = ax
b = bx
fa = f(a)
fb = f(b)
if (fb*fa > 0.) then
print*, 'a, b, fa, fb', a, b, fa, fb
dừng lại
end if
atol = 0.5 * aerr
rtol = max(0.5_dp*rerr, 2.0_dp*eps)

! begin step
10 c = a
fc = fa
d = b - a
e = d
20 if (abs(fc) < abs(fb)) then
a = b
b = c
c = a
fa = fb
fb = fc
fc = fa
end if

! convergence test
tol = rtol * max(abs(b),abs(c)) + atol
xm = 0.5 * (c-b)
if (abs(xm) > tol) then
if (fb /= 0.0) then
! is bisection necessary
if (abs(e) >= tol) then
if (abs(fa) > abs(fb)) then
! is quadratic interpolation possible
if (a == c) then
! linear interpolation
s = fb / fc
p = (c-b) * s
q = 1.0 - s
khác
! inverse quadratic interpolation
q = fa / fc
r = fb / fc
s = fb / fa
p = s * ((c-b)*q*(q-r)-(b-a)*(r-1.0))
q = (q-1.0) * (r-1.0) * (s-1.0)
end if
! adjust signs
if (p > 0.0) q = -q
p = abs(p)
! is interpolation acceptable
if (2.0*p < (3.0*xm*q-abs(tol*q))) then
if (p < abs(0.5*e*q)) then
e = d
d = p / q
go to 30
end if
end if
end if
end if

! bisection
d = xm
e = d

! complete step
30 a = b
fa = fb
if (abs(d) > tol) b = b + d
if (abs(d) <= tol) b = b + sign(tol,xm)
fb = f(b)
if (fb*(fc/abs(fc)) > 0.0) go to 10
go to 20
end if
end if

! done
fn_val = b
end function zeroin

kết thúc

所得样本的平均值约为 5,而标准差约为 1.64。

câu hỏi

有人知道我的算法在哪里可能会出现数值问题吗?事实上,Python 版本和 Fortran 版本都存在相同的问题,但程度不同,这让我相信这是 float 的舍入问题,但我无法想象在哪里。即使求解器返回舍入值,该差异也不应该显示在简单的直方图中。

有人发现我的算法有错误吗?我理解错了吗?

1 Câu trả lời

我只检查了Python版本,确实实现有错误。

即,您的testfunc ,即求根的目标函数brentq常规,行为具有不确定性。在寻根运行期间(即一次调用 brentq() 直至其返回),brentq需要多次调用相同的回调,直到达到收敛。然而,每次brentq调用此回调,目标方程更改r得到一个新的伪随机值。因此,寻根例程无法收敛到您想要的解决方案。

从概念上讲,您需要做的是首先生成均匀随机变量的样本,并对它们应用như nhau的确定性变换(即逆分布函数)。当然你不需要解根,因为你可以使用ppf (百分位数函数,即cdf的反函数)scipy.stats的方法随机变量类。

作为概念证明,您可以使用(不必要的昂贵且不太准确的)转换方法对标准统一样本数组运行以下代码:

nhập numpy
import numpy.random
from scipy.optimize import brentq
from scipy.stats import norm

# Setup
n = 10000
numpy.random.seed(0x5eed)
ran_array = numpy.random.uniform(size=n)

sol_array = numpy.empty_like(ran_array)


# Target function for root-finding: F(x) - p = 0 where p is the probability level
# for which the quantile is to be found
def targetfunc(x, p):
return norm.cdf(x, 5, 2) - p


for i in range(n):
sol_array[i] = brentq(targetfunc, -100.0, 100.0, args=(ran_array[i],))
print("mean = %10f" % sol_array.mean())
print("std = %10f" % sol_array.std())

Đầu ra:

mean = 5.011041
std = 2.009365

关于python - 逆 CDF 变换采样的分布略有错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49109481/

31 4 0
行者123
Hồ sơ cá nhân

Tôi là một lập trình viên xuất sắc, rất giỏi!

Nhận phiếu giảm giá Didi Taxi miễn phí
Mã giảm giá Didi Taxi
Giấy chứng nhận ICP Bắc Kinh số 000000
Hợp tác quảng cáo: 1813099741@qq.com 6ren.com