并行程序设计导论

出版社:机械工业出版社
出版日期:2011-9
ISBN:9787111358282
作者:（美）Peter S. Pacheco
页数:388页

作者简介

并行编程已不仅仅是面向专业技术人员的一门学科。如果想要全面开发机群和多核处理器的计算能力，那么学习分布式内存和共享式内存的并行编程技术是不可或缺的。本书循序渐进地展示了如何利用MPI、PThread 和OpenMP开发高效的并行程序，教给读者如何开发、调试分布式内存和共享式内存的程序，以及对程序进行性能评估。

本书特色

·  采用教程形式，从简短的编程实例起步，一步步编写更有挑战性的程序。

·  重点介绍分布式内存和共享式内存的程序设计、调试和性能评估。

·  使用MPI、PThread 和OpenMP等编程模型，强调实际动手开发并行程序。

【编辑推荐】

毫无疑问，随着多核处理器和云计算系统的广泛应用，并行计算不再是计算世界中被束之高阁的偏门领域。并行性已经成为有效利用资源的首要因素，Peter Pacheco撰写的这本新教材对于初学者了解并行计算的艺术和实践很有帮助。

——Duncan Buell，南卡罗来纳大学计算机科学与工程系

本书阐述了两个越来越重要的领域：使用Pthread和OpenMP进行共享式内存编程，以及使用MPI进行分布式内存编程。更重要的是，它通过指出可能出现的性能错误，强调好的编程实现的重要性。这本书在不同学科（包括计算机科学、物理和数学等）背景下介绍以上话题。各章节包含了难易程度不同的编程习题。对于希望学习并行编程技巧、扩展知识面的学生或专业人士来说，这是一本理想的参考书籍。

——Leigh Little，纽约州立大学布罗科波特学院计算机科学系

本书是一本精心撰写的全面介绍并行计算的书籍。学生以及相关领域从业者会从书中的相关最新信息中获益匪浅。作者以通俗易懂的写作手法，结合各种有趣的实例使本书引人入胜。在并行计算这个瞬息万变、不断发展的领域里，本书深入浅出、全面涵盖了并行软件和硬件的方方面面。

——Kathy J. Liszka，阿克隆大学计算机科学系

书籍目录

chapter 1 why parallel computing? 1

1.1 why we need ever-increasing performance 2

1.2 why we’re building parallel systems 3

1.3 why we need to write parallel programs 3

1.4 how do we write parallel programs? 6

1.5 what we’ll be doing 8

1.6 concurrent, parallel, distributed 9

1.7 the rest of the book 10

1.8 a word of warning 10

1.9 typographical conventions 11

1.10 summary 12

1.11 exercises 12

chapter 2 parallel hardware and parallel software 15

2.1 some background15

2.1.1 the von neumann architecture 15

2.1.2 processes, multitasking, and threads 17

2.2 modifications to the von neumann model 18

2.2.1 the basics of caching 19

2.2.2 cache mappings 20

.2.2.3 caches and programs: an example 22

2.2.4 virtual memory 23

2.2.5 instruction-level parallelism 25

2.2.6 hardware multithreading 28

2.3 parallel hardware 29

2.3.1 simd systems 29

2.3.2 mimd systems 32

2.3.3 interconnection networks 35

2.3.4 cache coherence 43

2.3.5 shared-memory versus distributed-memory 46

2.4 parallel software 47

2.4.1 caveats 47

2.4.2 coordinating the processes/threads 48

2.4.3 shared-memory 49

2.4.4 distributed-memory 53

2.4.5 programming hybrid systems 56

2.5 input and output 56

2.6 performance 58

2.6.1 speedup and efficiency 58

2.6.2 amdahl’s law 61

2.6.3 scalability 62

2.6.4 taking timings 63

2.7 parallel program design 65

2.7.1 an example 66

2.8 writing and running parallel programs 70

2.9 assumptions 70

2.10 summary 71

2.10.1 serial systems 71

2.10.2 parallel hardware 73

2.10.3 parallel software 74

2.10.4 input and output 75

2.10.5 performance 75

2.10.6 parallel program design 76

2.10.7 assumptions 76

2.11 exercises 77

chapter 3 distributed-memory programming with mpi 83

3.1 getting started84

3.1.1 compilation and execution 84

3.1.2 mpi programs 86

3.1.3 mpi init and mpi finalize 86

3.1.4 communicators, mpi comm size and mpi comm rank 87

3.1.5 spmd programs 88

3.1.6 communication 88

3.1.7 mpi send 88

3.1.8 mpi recv 90

3.1.9 message matching 91

3.1.10 the status p argument 92

3.1.11 semantics of mpi send and mpi recv 93

3.1.12 some potential pitfalls 94

3.2 the trapezoidal rule in mpi 94

3.2.1 the trapezoidal rule 94

3.2.2 parallelizing the trapezoidal rule 96

contents xiii

3.3 dealing with i/o 97

3.3.1 output 97

3.3.2 input 100

3.4 collective communication101

3.4.1 tree-structured communication 102

3.4.2 mpi reduce 103

3.4.3 collective vspoint-to-point communications 105

3.4.4 mpi allreduce 106

3.4.5 broadcast 106

3.4.6 data distributions 109

3.4.7 scatter 110

3.4.8 gather 112

3.4.9 allgather 113

3.5 mpi derived datatypes 116

3.6 performance evaluation of mpi programs119

3.6.1 taking timings 119

3.6.2 results 122

3.6.3 speedup and efficiency 125

3.6.4 scalability 126

3.7 a parallel sorting algorithm 127

3.7.1 some simple serial sorting algorithms 127

3.7.2 parallel odd-even transposition sort 129

3.7.3 safety in mpi programs 132

3.7.4 final details of parallel odd-even sort 134

3.8 summary 136

3.9 exercises 140

3.10 programming assignments 147

chapter 4 shared-memory programming with pthreads 151

4.1 processes, threads, and pthreads 151

4.2 hello, world 153

4.2.1 execution 153

4.2.2 preliminaries 155

4.2.3 starting the threads 156

4.2.4 running the threads 157

4.2.5 stopping the threads 158

4.2.6 error checking 158

4.2.7 other approaches to thread startup159

4.3 matrix-vector multiplication 159

4.4 critical sections 162

xiv contents

4.5 busy-waiting 165

4.6 mutexes 168

4.7 producer-consumer synchronization and semaphores171

4.8 barriers and condition variables 176

4.8.1 busy-waiting and a mutex 177

4.8.2 semaphores 177

4.8.3 condition variables 179

4.8.4 pthreads barriers 181

4.9 read-write locks 181

4.9.1 linked list functions 181

4.9.2 a multi-threaded linked list 183

4.9.3 pthreads read-write locks 187

4.9.4 performance of the various implementations 188

4.9.5 implementing read-write locks 190

4.10 caches, cache coherence, and false sharing 190

4.11 thread-safety 195

4.11.1 incorrect programs can produce correct output 198

4.12 summary 198

4.13 exercises 200

4.14 programming assignments 206

chapter 5 shared-memory programming with openmp 209

5.1 getting started210

5.1.1 compiling and running openmp programs 211

5.1.2 the program 212

5.1.3 error checking 215

5.2 the trapezoidal rule 216

5.2.1 a first openmp version 216

5.3 scope of variables 220

5.4 the reduction clause 221

5.5 the parallel for directive 224

5.5.1 caveats 225

5.5.2 data dependences 227

5.5.3 finding loop-carried dependences 228

5.5.4 estimating 229

5.5.5 more on scope 231

5.6 more about loops in openmp: sorting 232

5.6.1 bubble sort 232

5.6.2 odd-even transposition sort 233

5.7 scheduling loops 236

5.7.1 the schedule clause 237

5.7.3 the dynamic and guided schedule types 239

5.7.4 the runtime schedule type 239

5.7.5 which schedule? 241

5.8 producers and consumers 241

5.8.1 queues 241

5.8.2 message-passing 242

5.8.3 sending messages 243

5.8.4 receiving messages 243

5.8.5 termination detection 244

5.8.6 startup 244

5.8.7 the atomic directive 245

5.8.8 critical sections and locks 246

5.8.9 using locks in the message-passing program 248

5.8.10 critical directives, atomic directives,

or locks? 249

5.8.11 some caveats 249

5.9 caches, cache coherence, and false sharing 251

5.10 thread-safety 256

5.10.1 incorrect programs can produce correct output 258

5.11 summary 259

5.12 exercises 263

5.13 programming assignments 267

chapter 6 parallel program development 271

6.1 two n-body solvers 271

6.1.1 the problem 271

6.1.2 two serial programs 273

6.1.3 parallelizing the n-body solvers 277

6.1.4 a word about i/o 280

6.1.5 parallelizing the basic solver using openmp 281

6.1.6 parallelizing the reduced solver using openmp 284

6.1.7 evaluating the openmp codes 288

6.1.8 parallelizing the solvers using pthreads 289

6.1.9 parallelizing the basic solver using mpi 290

6.1.10 parallelizing the reduced solver using mpi 292

6.1.11 performance of the mpi solvers 297

6.2 tree search 299

6.2.1 recursive depth-first search 302

6.2.2 nonrecursive depth-first search 303

6.2.3 data structures for the serial implementations 305

6.2.6 a static parallelization of tree search using pthreads 309

6.2.7 a dynamic parallelization of tree search using pthreads 310

6.2.8 evaluating the pthreads tree-search programs 315

6.2.9 parallelizing the tree-search programs using openmp 316

6.2.10 performance of the openmp implementations 318

6.2.11 implementation of tree search using mpi and static partitioning 319

6.2.12 implementation of tree search using mpi and dynamic partitioning 327

6.3 a word of caution 335

6.4 which api? 335

6.5 summary 336

6.5.1 pthreads and openmp 337

6.5.2 mpi 338

6.6 exercises 341

6.7 programming assignments 350

chapter 7 where to go from here 353

references 357

index 361

内容概要

Peter Pacheco 拥有佛罗里达州立大学数学专业博士学位。曾担任旧金山大学计算机系主任，目前是旧金山大学数学系主任。近20年来，一直为本科和研究生讲授并行计算课程。

媒体关注与评论

毫无疑问，随着多核处理器和云计算系统的广泛应用，并行计算不再是计算世界中被束之高阁的偏门领域。并行性已经成为有效利用资源的首要因素，Peter Pactleco撰写的这本新教材对于初学者了解并行计算的艺术和实践很有帮助。　　——Duncan Buell南卡罗来纳大学计算机科学与工程系本书阐述了两个越来越重要的领域：使用PThread和OperIMP进行共享式内存编程，以及使用MPl进行分布式内存编程。更重要的是，它通过指出可能出现的性能错误，强调好的编程实现的重要性。这本书在不同学科（包括计算机科学、物理和数学等）背景下介绍以上话题，各章节包含了难易程度不同的编程习题。对于希望学习并行编程技巧、扩展知识面的学生或专业人士来说，这是一本理想的参考书籍。　　——Leigh Little纽约州立大学布罗科波特学院计算机科学系本书是一本精心撰写的全面介绍并行计算的书籍，学生以及相关领域从业者会从书中的相关最新信息中获益匪浅。作者以通俗易懂的写作手法，结合各种有趣的实例使本书引人入胜。在并行计算这个瞬息万变、不断发展的领域里，本书深入浅出、全面涵盖了并行软件和硬件的方方面面。　　——Kathy J.Liszka阿克隆大学计算机科学系

章节摘录

版权页：插图：There are many possible algorithms for identifying which subtrees we assign to the processes or threads.For example，one，thread or process could run the last version of serial depth.first search until the stack stores one partial tour for each thread or process.Then it could assign one tour to each thread or process.The problem wim depth.first searchisthatweexpecta subtreewhoserootisdeeperinthetreetorequire less work than a subtree whose root is higher up in the tree，so we would probably get better load balance if we used something like breadth.first search to identify t11e subtrees.As the name suggests，breadth-first search searches as widely as possible in the  treebefore goingdeeper.Soif,forexample，we CalTyout abreadth-first searchuntil  we reach alevel ofthe tree that has at least th reftd-count or comm-sz nodes.we can  men divide the nodes at this level among the threads or processes.See Exercise 6.1 8  for implementation details.  The best tour data structure  On a shared-memory system，the best tour data structure can be shared.In this setting。  the Feasibl e function Call simply examine the data structure.However,updates to  the best tour will cause a race condition，and we U need some sort of locking t0  prevent errors.Wle’11 discuss this in more detail when we implement the parallel  version.In the case of a distributed-memory system，there are a couple of choices that we need to make about the best tour.T11e simplest option would be to have the processes operate independently of each other until they have completed searching their sub-trees.In this setting.each process would store its own local best tour.111is local best tourwouldbeusedbytheprocessin Fea s{b1 e andupdatedbytheprocesseachtime it calls Update-best tour.

图书封面

并行程序设计导论下载

发布书评

精彩短评 (总计16条)

太简单了，就是介绍了MPI，pthread和OpenMP
浅显易懂，作为入门书箱还是很适合的。
比国内很多并行计算的书要好很多！本书思路清晰，如果学习并行计算，通过这本书可以少走很多弯路。
介绍得很仔细，名副其实，真的是an introduction，很适合作为入门教材。
刚到手的，还没细看。感觉还可以，谢谢大中午盯着太阳送包裹的快递员大姐
之前买了一本华章的升入理解计算机系统概念，那本书是双色彩印版，所以理所当然的认为这本书是一个系列的也应该是双色彩印的。买回来发现不是，有点失望
尚未阅读，期待能帮助自己在并发算法设计上有所提高。
很不错，正版书，很喜欢
英文版，太费脑子，不过不错
非常简单的基础知识，也就是和并行编程打个照面。里面介绍了MPI和OpenMP以及pthread三中主流的并行程序编程接口，也就是作为行内科普，了解即可，长长见识。
虽然按照作者的说法，本书可以用于大学一、二年级。实际上对C程序设计、计算机软、硬件（多核）有深入了解的读者会更加有用。
还没来得及看，包装印刷什么的还算满意。
还是需要读《Unix环境高级编程》
MPI, Pthreads, and OpenMP都有讲，很好的书。只不过是第一版，估计很快会出第二版，而且这本书的价格也不便宜。
英文原版，慢慢看
首先这是一本书，一本有关并行的书。其次，感谢送货员大叔，大中午的送货，辛苦了，谢谢！！！

并行程序设计导论

发布书评

精彩短评 (总计16条)

类似图书

相关图书推荐