cuda - CUDA/CUDA Thrust 中的多态性和派生类-6ren

cuda - CUDA/CUDA Thrust 中的多态性和派生类

In lại 作者：行者123 更新时间：2023-12-05 00:39:11

这是我关于 Stack Overflow 的第一个问题，这是一个很长的问题。 tl;dr 版本是:我如何使用 thrust::device_vector如果我希望它存储不同类型的对象 DerivedClass1 , DerivedClass2等，同时？

我想利用 CUDA Thrust 的多态性。我正在编译 -arch=sm_30 GPU(GeForce GTX 670)。

让我们看看下面的问题:假设镇上有 80 个家庭。其中60人为已婚夫妇，20人为单亲家庭。因此，每个家庭都有不同数量的成员。现在是人口普查时间，家庭必须说明 parent 的年龄和他们有多少 child 。因此，Family 的数组对象由政府 build ，即thrust::device_vector familiesInTown(80) , 这样的家庭信息 familiesInTown[0]ĐẾNfamiliesInTown[59]对应已婚夫妇，其余(familiesInTown[60] ĐẾN familiesInTown[79])为单亲家庭。

Family是基类 - 家庭中 parent 的数量(单亲 parent 为 1，夫妻为 2)和他们拥有的 child 的数量作为成员存储在这里。

SingleParent , 源自 Family ，包括一个新成员 - 单亲的年龄，unsigned int ageOfParent .

MarriedCouple , 也源自 Family然而，引入了两个新成员 - 双方 parent 的年龄，unsigned int ageOfParent1Và unsigned int ageOfParent2 .

#include 
#include 
#include 

class Family
{
protected:
  unsigned int numParents;
  unsigned int numChildren;
công cộng:
  __host__ __device__ Family() {};
  __host__ __device__ Family(const unsigned int& nPars, const unsigned int& nChil) : numParents(nPars), numChildren(nChil) {};
  __host__ __device__ virtual ~Family() {};

  __host__ __device__ unsigned int showNumOfParents() {return numParents;}
  __host__ __device__ unsigned int showNumOfChildren() {return numChildren;}
};

class SingleParent : public Family
{
protected:
  unsigned int ageOfParent;
công cộng:
  __host__ __device__ SingleParent() {};
  __host__ __device__ SingleParent(const unsigned int& nChil, const unsigned int& age) : Family(1, nChil), ageOfParent(age) {};

  __host__ __device__ unsigned int showAgeOfParent() {return ageOfParent;}
};

class MarriedCouple : public Family
{
protected:
  unsigned int ageOfParent1;
  unsigned int ageOfParent2;
công cộng:
  __host__ __device__ MarriedCouple() {};
  __host__ __device__ MarriedCouple(const unsigned int& nChil, const unsigned int& age1, const unsigned int& age2) : Family(2, nChil), ageOfParent1(age1), ageOfParent2(age2) {};

  __host__ __device__ unsigned int showAgeOfParent1() {return ageOfParent1;}
  __host__ __device__ unsigned int showAgeOfParent2() {return ageOfParent2;}
};

如果我天真地启动我的 thrust::device_vector 中的对象使用以下仿函数:

struct initSlicedCouples : public thrust::unary_function
{
  __device__ MarriedCouple operator()(const unsigned int& idx) const
  // I use a thrust::counting_iterator to get idx
  {
    return MarriedCouple(idx % 3, 20 + idx, 19 + idx); 
    // Couple 0: Ages 20 and 19, no children
    // Couple 1: Ages 21 and 20, 1 child
    // Couple 2: Ages 22 and 21, 2 children
    // Couple 3: Ages 23 and 22, no children
    // etc
  }
};

struct initSlicedSingles : public thrust::unary_function
{
  __device__ SingleParent operator()(const unsigned int& idx) const
  {
    return SingleParent(idx % 3, 25 + idx);
  }
};

int chính()
{
  unsigned int Num_couples = 60;
  unsigned int Num_single_parents = 20;

  thrust::device_vector familiesInTown(Num_couples + Num_single_parents);
  // Families [0] to [59] are couples. Families [60] to [79] are single-parent households.
  thrust::transform(thrust::counting_iterator(0),
                    thrust::counting_iterator(Num_couples),
                    familiesInTown.begin(),
                    initSlicedCouples());
  thrust::transform(thrust::counting_iterator(Num_couples),
                    thrust::counting_iterator(Num_couples + Num_single_parents),
                    familiesInTown.begin() + Num_couples,
                    initSlicedSingles());
  trả về 0;
}

我肯定会犯一些经典的 object slicing ...

所以，我问自己，一个指针向量可能会给我一些甜蜜的多态性呢？ Smart pointers在 C++ 中是一个东西，而 thrust迭代器可以做一些非常令人印象深刻的事情，所以让我们试一试吧，我想。以下代码编译。

struct initCouples : public thrust::unary_function
{
  __device__ MarriedCouple* operator()(const unsigned int& idx) const
  {
    return new MarriedCouple(idx % 3, 20 + idx, 19 + idx); // Memory issues?
  }
};
struct initSingles : public thrust::unary_function
{
  __device__ SingleParent* operator()(const unsigned int& idx) const
  {
    return new SingleParent(idx % 3, 25 + idx);
  }
};

int chính()
{
  unsigned int Num_couples = 60;
  unsigned int Num_single_parents = 20;

  thrust::device_vector familiesInTown(Num_couples + Num_single_parents);
  // Families [0] to [59] are couples. Families [60] to [79] are single-parent households.
  thrust::transform(thrust::counting_iterator(0),
                    thrust::counting_iterator(Num_couples),
                    familiesInTown.begin(),
                    initCouples()); 
  thrust::transform(thrust::counting_iterator(Num_couples),
                    thrust::counting_iterator(Num_couples + Num_single_parents),
                    familiesInTown.begin() + Num_couples,
                    initSingles());

  Family A = *(familiesInTown[2]); // Compiles, but object slicing takes place (in theory)
  std::cout << A.showNumOfParents() << "\n"; // Segmentation fault
 trả về 0;
}

好像我在这里碰壁了。我是否正确理解内存管理？ ( VTables 等)。我的对象是否在设备上被实例化和填充？我是否像没有明天一样泄漏内存？

对于它的值(value)，为了避免对象切片，我尝试使用 dynamic_cast(basePointer) .这就是为什么我做了我的 Family析构函数 ảo .

Family *pA = familiesInTown[2];
MarriedCouple *pB = dynamic_cast(pA);

以下行编译，但不幸的是，再次引发了段错误。 CUDA-Memcheck 不会告诉我原因。

  std::cout << "Ages " << (pB -> showAgeOfParent1()) << ", " << (pB -> showAgeOfParent2()) << "\n";

Và

  MarriedCouple B = *pB;
  std::cout << "Ages " << B.showAgeOfParent1() << ", " << B.showAgeOfParent2() << "\n";

简而言之，我需要一个对象的类接口(interface)，这些对象将具有不同的属性，彼此之间具有不同数量的成员，但我可以将其存储在一个可以操作的公共(public)向量中(这就是我想要一个基类的原因)显卡。我打算在 thrust 中与他们一起工作。通过 thrust::raw_pointer_cast 转换和在 CUDA 内核中ing，这对我来说完美无缺，直到我需要将我的类扩展到一个基础类和几个派生类。这样做的标准程序是什么？

Cảm ơn trước nhé!

1 Câu trả lời

我不会试图回答这个问题的所有内容，它太大了。话虽如此，以下是对您发布的代码的一些观察，可能会有所帮助:

GPU方面mới运算符从私有(private)运行时堆中分配内存。从 CUDA 6 开始，主机端 CUDA API 无法访问该内存。您可以从内核和设备函数中访问内存，但主机无法访问该内存。所以使用 mới推力装置仿函数内部是一个永远无法工作的损坏设计。这就是您的“指针向量”模型失败的原因。

Thrust 的根本目的是允许将典型 STL 算法的数据并行版本应用于 POD 类型。使用复杂的多态对象构建代码库并尝试通过 Thrust 容器和算法填充它们可能会起作用，但这不是 Thrust 的设计目的，我不会推荐它。如果您以意想不到的方式打破推力，请不要感到惊讶。

CUDA 支持许多 C++ 特性，但编译和对象模型甚至比它们所基于的 C++98 标准要简单得多。 CUDA 缺少一些使复杂的多态对象设计在 C++ 中可行的关键特性(例如 RTTI)。我的建议是谨慎使用 C++ 功能。仅仅因为您可以在 CUDA 中做某事并不意味着您应该这样做。 GPU 是一个简单的架构，简单的数据结构和代码几乎总是比功能相似的复杂对象具有更高的性能。

浏览了您发布的代码后，我的总体建议是回到绘图板上。如果您想了解一些非常优雅的 CUDA/C++ 设计，请花一些时间阅读 CUB 的代码库和 CUSP .它们都非常不同，但都可以从中学到很多东西(我怀疑 CUSP 是建立在 Thrust 之上的，这使得它与您的用例更加相关)。

关于cuda - CUDA/CUDA Thrust 中的多态性和派生类，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22988244/

Bài viết được đề xuất: r - gdata 包中的 drop.levels(x) 和 as.factor(as.character(x)) 有什么区别？

Bài viết được đề xuất: ios8 - 为什么 CLLocationManager startUpdatingLocation 在 iOS8 中不触发？

Bài viết được đề xuất: jdbc - 杰森 zxJDBC : How to get a dictionary from a cursor?

Bài viết được đề xuất: uml - 简单的 UML 工具

cuda - Thrust::min_element 在 Thrust::device_vector 上发生崩溃(CUDA Thrust)
以下 CUDA Thrust 程序崩溃: #include #include int main(void) { thrust::device_vector vec; for (int i(
c++ - thrust::device_vector 使用 thrust::replace 或 thrust::transform 自定义仿函数/谓词
我使用 cuda 内核对推力 vector 执行 S 形激活: thrust::device_vector output = input; float * output_ptr = thrust::r
c++ - thrust::complex with thrust reduce 无法编译
我一直在尝试实现一些需要在 thrust::complexes 上调用 reduce 的代码，编译器向我发出错误消息: cannot pass an argument with a user-prov
c++ - cuda thrust::for_each with thrust::counting_iterator
我是 CUDA 的新手，而且很吃力。当提供 counting_iterator 时，我似乎无法让 thrust::for_each 算法工作。这是我的简单仿函数: struct print_Funct
c++ - thrust::device_vector of thrust::complex 编译错误，可能是由于错误的实现
我实际上正在学习CUDA和thrust，我正在尝试用.cpp做一个项目，。 hpp 文件和 .cu, .cuh 文件。因此，我做了第一个小实现(见下面的代码)，但是我有一个编译错误。这是 output
c++ - 如何使用 CUDA Thrust 执行策略覆盖 Thrust 的低级设备内存分配器
我想覆盖低级 CUDA 设备内存分配器(实现为 thrust::system::cuda::detail::malloc())，以便它使用自定义分配器而不是直接调用 cudaMalloc()在主机 (
c++ - 如何将二维 thrust::device_vector<>> 转换为原始指针
当我在main函数中使用thrust::device_vector时，可以正确的传递给内核函数，代码如下: thrust::device_vector device_a(2); thrust::h
c++ - Thrust device vector of thrust device vector 推力装置 vector
我在 CUDA 中使用这种 vector 方法的 vector 方法，因为我仍然习惯于 Matlab 和 Python 风格的编程环境。我能够从设备 vector 中的主机端提取数据，但现在我不确定如
c++ - 命名空间 thrust::system::cuda::thrust 中无法解释的错误，特别是在 "system_error"和 "cuda_category"
我正在尝试使用 thrust::raw_pointer_cast 转换原始指针以捕获仿函数中的输出。我尝试了多种方法来将指针传递给 float ，但不断出现内存冲突和两个智能感知错误 thrust::
thrust 学习笔记
gather与scatter正好相反： scatter是顺序输入根据map确定撒点输出位置。 #include #include #include ... // mark even indice
cuda - Thrust 是同步还是异步？
我是 Thrust 的新手，有件事我不明白。 Thrust 是异步还是同步？如果我编写以下代码，所花费的时间不是0。但在其他标签中，其他用户报告的结果为0。真相是什么？ clock_t start,
thrust - 编译器不支持#pragma Once
我的编译器 (PGI) 不支持 #pragma once 但是我想包含的库(推力)使用它们。这个问题有解决办法吗？最佳答案您可以使用guardonce将 #pragma Once 语句转换为标准
cuda - Thrust::remove_if的返回值类型
我的设备上有两个整数数组 dmap 和 dflag相同的长度我用推力设备指针 dmapt 和dflagt dmap 数组中有一些值为 -1 的元素。我想要删除这些 -1 和相应的值dflag 数组。
cuda - Thrust 如何知道如何自动配置它启动的内核？
Thrust 能够对编码器隐藏各种细节，并且声称 Thrust 会根据系统规范在一定程度上设置参数。 Thrust 如何选择最佳参数化，以及如何处理不同机器上的各种代码？ Thrust 实现这种通用库
cuda - Thrust 设备管理和内核
我在当前项目中使用了 Thrust，所以我不必写 device_vector自己抽象或(分段)扫描内核。到目前为止，我已经使用推力抽象完成了我的所有工作，但是对于简单的内核或不容易转换为 for_e
c++ - Thrust 中的虚方法调用
我想做这样的事情: BaseFunctor* f = new MyFunctor(); thrust::transform(it1,it2,MyFunctor); 目标是让用户能够传递不同的仿函数(具
c++ - Thrust 对主机上运行的自定义仿函数的结果不正确
当我尝试实现任何仿函数时，我得到了不好的结果。例如，我尝试了一个类似于 thrust::negate 的否定仿函数下面是一个示例代码，它使用内置的否定仿函数产生了良好的结果: int data[10]
在 thrust 中调用用户定义的函数
我正在使用 OpenCV 加载一个 .png 文件，我想使用 thrust 库提取它的蓝色强度值。我的代码是这样的: 使用 OpenCV IplImage 指针加载图像将图像数据复制到thrust
c++ - Thrust+boost代码编译错误
我有一个奇怪的问题，我无法解决。它与 boost +推力代码相关联。代码: #include #include #include #include #include #include #
cuda - 使用 Thrust 的向量数组
是否可以使用 Thrust 创建一个 device_vectors 数组？我知道我不能创建一个 device_vector 的 device_vector，但是我将如何创建一个 device_vect

行者123

Hồ sơ cá nhân

Tôi là một lập trình viên xuất sắc, rất giỏi!

Bài viết phổ biến của tác giả

Nhận phiếu giảm giá Didi Taxi miễn phí

Các bài viết nóng hổi trên toàn bộ trang web

trang đầu

đã học

Trí tuệ nhân tạo 6Ren

Trung tâm mua sắm

cuda - CUDA/CUDA Thrust 中的多态性和派生类