数据任务并行计算研究-白红宇的个人博客

数据任务并行计算研究

发布日期：2021-06-29 14:42:49 浏览次数：3 分类：技术文章

本文共 3037 字，大约阅读时间需要 10 分钟。

参考：https://software.intel.com/zh-cn/blogs/2011/12/02/400009299

参考：

参考：http://blog.csdn.net/u012337841/article/details/16358547

(Owed by: 春夜喜雨 http://blog.csdn.net/chunyexiyu)

Datalevel parallelism（DLP）数据级并行：SIMD（Single Instruction Multiple Data），单指令多数据流，采用一个控制器来控制多个处理器，对一组数据（又称“数据向量”）中的每一个分别执行相同的操作从而实现空间上的并行性的技术。

INTEL处理器支持的SIMD技术包括MMX/SSE/AVX.

MMX提供了8个64bit的寄存器进行SIMD操作；

SSE系列提供了8个128bit的寄存器进行SIMD指令操作；

而最新的AVX指令则支持256bit的SIMD操作；

目前SIMD指令可以有四种方法进行使用分别是汇编语言，C++类，编译器Intrisincs和自动矢量化。

Tasklevel parallelism（TLP）任务级并行：聚焦于在不同的处理器上同时分配进程或线程执行的任务。典型的应用是Pipeline（流水线），把任务拆分成独立的模块分别执行。

Pipeline在计算上包括：Instructionpipelines / Graphics pipelines / Software pipelines / Http pipelines

Instructionlevel parallelism（ILP）指令级并行：是指处理器能同时处理多条指令。有两种并行的方法，一种是硬件级，一种是软件级。

使用OpenMP可以实现任务并行和数据并行；使用MPI也可以实现任务的并行，通过启动多进程执行任务的并行通信。

使用OpenCL可以实现数据级并行，它会把GPU使用起来，用于更大规模的计算并行（OpenCL把CPU，GPU都当成计算单元，可以有更多的处理单元，成本相对也高一些---例如加载数据到显存。）

下面是并行的一些样例代码：

1. MMX:

void addMMX(float* a, float* b, float* c, int nSize) {

__m64* pA = (__m64*)a;

__m64* pB = (__m64*)b;

__m64* pR = (__m64*)r;

for (int i = 0; i < nSize /2; i++) {

pR[i] = _mm_add_pi32(pA[i], pB[i]);;

}

2. SSE:

void mutiSSE(double* a, double* b, double* r, int nSize) {

__m128d* pA = (__m128d*)a;

__m128d* pB = (__m128d*)b;

__m128d* pR = (__m128d*)r;

for (int i = 0; i < nSize / 2; i++) {

pR[i] = _mm_mul_pd(pA[i], pB[i]);

}

3. AVX:

void mutiAVX(double* a, double* b, double* r, int nSize) {

__m256d* pA = (__m256d*)a;

__m256d* pB = (__m256d*)b;

__m256d* pR = (__m256d*)r;

for (int i = 0; i < nSize / 4; i++) {

pR[i] = _mm256_mul_pd(pA[i], pB[i]);

}

4. OpenMP: 需要打开编译选项/openmp

void mutiOMP(double* a, double* b, double* r, int nSize) {

#pragma omp parallel for

for (int i = 0; i < nSize; i++) {

r = a[i] * b[i];

}

5. MPI: Windows下需要下载MPI的SDK和运行库（可以下载MircosoftMPI）

main(int argc, char **argv)

{

char buf[256];

int nRank, nProcNum;

/* Initialize the infrastructure necessary for communication */

MPI_Init(&argc, &argv);

/* Identify this process */

MPI_Comm_rank(MPI_COMM_WORLD, &nRank);

/* Find out how many total processes are active */

MPI_Comm_size(MPI_COMM_WORLD, &nProcNum);

/* Until this point, all programs have been doing exactly the same.

Here, we check the rank to distinguish the roles of the programs */

if (nRank == 0) {

int nOtherProc;

printf("We have %i processes.\n", nProcNum);

/* Receive messages from all other process */

for (nOtherProc = 1; nOtherProc < nProcNum; nOtherProc++)

{

MPI_Recv(buf, sizeof(buf), MPI_CHAR, nOtherProc,

0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

printf("%s\n", buf);

}

else {

/* Send message to process #0 */

sprintf(buf, "Process %i reporting for duty.", nRank);

MPI_Send(buf, sizeof(buf), MPI_CHAR, 0,

0, MPI_COMM_WORLD);

}

/* Tear down the communication infrastructure */

MPI_Finalize();

return 0;

}

运行的时候：

mpiexec.exe -n 5 demo.exe

We have 5 processes.

Process 1 reporting for duty.

Process 2 reporting for duty.

Process 3 reporting for duty.

Process 4 reporting for duty.

(Owed by: 春夜喜雨 http://blog.csdn.net/chunyexiyu)

转载地址：https://chunyexiyu.blog.csdn.net/article/details/78935636 如侵犯您的版权，请留言回复原文章的地址，我们会给您删除此文章，给您带来不便请您谅解！

上一篇：团队改进系统思考

下一篇：VSIX调测插件开发

发表评论

关于作者

喝酒易醉，品茶养心，人生如梦，品茶悟道，何以解忧？唯有杜康！

-- 愿君每日到此一游！

发表评论

最新留言

关于作者

推荐文章