sách gpt4 ai đã đi

performance - 矢量化代码比循环慢?软件

In lại 作者:太空宇宙 更新时间:2023-11-03 20:01:28 27 4
mua khóa gpt4 Nike

我在做的问题中有这么一段代码,如下图。定义部分只是为了向您展示数组的大小。下面我粘贴了矢量化版本 - 它慢了 2 倍以上。为什么会这样?我知道如果矢量化需要大的临时变量,我就会发生,但(看起来)这里不是真的。

一般来说,我可以做些什么(除了 parfor,我已经在使用)来加速这段代码?

maxN = 100;  
levels = maxN+1;
xElements = 101;
umn = complex(zeros(levels, levels));
umn2 = umn;
bessels = ones(xElements, xElements, levels); % 1.09 GB
posMcontainer = ones(xElements, xElements, maxN);

tic
for j = 1 : xElements
for i = 1 : xElements
for n = 1 : 2 : maxN
nn = n + 1;
mm = 1;
for m = 1 : 2 : n
umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m);
mm = mm + 1;
kết thúc
kết thúc
kết thúc
kết thúc
toc % 0.520594 seconds


tic
for j = 1 : xElements
for i = 1 : xElements
for n = 1 : 2 : maxN
nn = n + 1;
m = 1:2:n;
numOfEl = ceil(n/2);
umn2(nn, 1:numOfEl) = bessels(i, j, nn) * posMcontainer(i, j, m);
kết thúc
kết thúc
kết thúc
toc % 1.275926 seconds

sum(sum(umn-umn2)) % veryfying, if all done right

最好的问候,
亚历克斯

来自分析器:

from Profiler

biên tập:

回复@Jason answer ,这个替代方案需要相同的时间:

for n = 1:2:maxN  
nn(n) = n + 1;
numOfEl(n) = ceil(n/2);
kết thúc

for j = 1 : xElements
for i = 1 : xElements
for n = 1 : 2 : maxN
umn2(nn(n), 1:numOfEl(n)) = bessels(i, j, nn(n)) * posMcontainer(i, j, 1:2:n);
kết thúc
kết thúc
kết thúc

编辑2:
回复@EBH :
重点是执行以下操作:

parfor i = 1 : xElements  
for j = 1 : xElements
umn = complex(zeros(levels, levels)); % cleaning
for n = 0:maxN
mm = 1;
for m = -n:2:n
nn = n + 1; % for indexing

if m < 0
umn(nn, mm) = bessels(i, j, nn) * negMcontainer(i, j, abs(m));
kết thúc

if m > 0
umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m);
kết thúc

if m == 0
umn(nn, mm) = bessels(i, j, nn);
kết thúc

mm = mm + 1; % for indexing
end % m
end % n
beta1 = sum(sum(Aj1.*umn));
betaSumSq1(i, j) = abs(beta1).^2;

beta2 = sum(sum(Aj2.*umn));
betaSumSq2(i, j) = abs(beta2).^2;
end % j
end % i

我尽可能加快了速度。您所写的内容仅采用最后的 besselsposMcontainer 值,因此不会产生相同的结果。在实际代码中,这两个容器中填充的不是 1,而是一些预先计算好的值。

1 Câu trả lời

在你编辑之后,我可以看到 umn 只是另一个计算的临时变量。它仍然可以大部分是矢量化的:

betaSumSq1 = zeros(xElements); % preallocating
betaSumSq2 = zeros(xElements); % preallocating
% an index matrix to fetch the right values from negMcontainer and
% posMcontainer:
indmat = tril(repmat([0 1;1 0],ceil((maxN+1)/2),floor(levels/2)));
indmat(end,:) = [];
% an index matrix to fetch the values in correct order for umn:
b_ind = repmat([1;0],ceil((maxN+1)/2),1);
b_ind(end) = [];
tempind = logical([fliplr(indmat) b_ind indmat+triu(ones(size(indmat)))]);

% permute the arrays to prevent squeeze:
PM = permute(posMcontainer,[3 1 2]);
NM = permute(negMcontainer,[3 1 2]);
B = permute(bessels,[3 1 2]);

for k = 1 : maxN+1 % third dim
for jj = 1 : xElements % columns
b = B(:,jj,k); % get one vector of B

% perform b*NM for every row of NM*indmat, than flip the result:
neg = fliplr(bsxfun(@times,bsxfun(@times,indmat,NM(:,jj,k).'),b));

% perform b*PM for every row of PM*indmat:
pos = bsxfun(@times,bsxfun(@times,indmat,PM(:,jj,k).'),b);

temp = [neg mod(1:levels,2).'.*b pos].'; % concat neg and pos
% assign them to the right place in umn:
umn = reshape(temp(tempind.'),[levels levels]).';

beta1 = Aj1.*umn;
betaSumSq1(jj,k) = abs(sum(beta1(:))).^2;
beta2 = Aj2.*umn;
betaSumSq2(jj,k) = abs(sum(beta2(:))).^2;
kết thúc
kết thúc

这将运行时间从 ~95 秒减少到少于 3 秒(两者都没有 parfor),所以它改进了几乎 97%.

关于performance - 矢量化代码比循环慢?软件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39183125/

27 4 0
太空宇宙
Hồ sơ cá nhân

Tôi là một lập trình viên xuất sắc, rất giỏi!

Nhận phiếu giảm giá Didi Taxi miễn phí
Mã giảm giá Didi Taxi
Giấy chứng nhận ICP Bắc Kinh số 000000
Hợp tác quảng cáo: 1813099741@qq.com 6ren.com