**Parallelism and Concurrency**
**Reed College CSCI 361 Sprint 2026**
, 1940.](images/mickey-brooms.jpg)
This course investigates the theory of algorithms and of system
support for high performance multiprocessor systems. It is also an
introduction to implementing these algorithms in executable code.
Throughout the semester, students will be asked to demonstrate the
ideas on current multiprocessor hardware.
The first two-thirds of the course develop the theory of parallel
algorithms, surveying some classic results including prefix scans and
sorting, and also their applications to graphs and sequence analysis.
This dovetails with actual implementation on a variety of parallel
languages and concurrency libraries (e.g. POSIX threads, CUDA).
Students spend the last third researching a parallel solution to
some independently chosen algorithmic problem, and coding it up.
We will survey topics in the last third of the course. Possible topics
surveyed include scheduler and runtime support for executing parallel
and concurrent programs and complexity-theoretic bounds on parallelism.
**Prerequisites:** CSCI 221 and MATH/CSCI 382.
**Instructor:** Jim Fix `
**Meets:** MW 2:40-4pm in Library 340.
**Office Hours:**
Jim [lists several times](../teaching.html) when he is available for help or for discussing the course. He's around his office MTuW at other times; students normally can just pop by, and can also email to arrange an appointment.
# Schedule
[**Lecture 1:**](week1/PC-Lec01.pdf) the end of Moore's Law
**Lecture 2:** a parallel fork-join merge sort
• the [Go code](week1/pmergesort.go), [its patty trace](week1/pmergesort64.out), and [speedups](week1/pmergesort.pdf).
• **Reading:** CLRS(3e) Chapter 27.3
**Lecture 3:** analyze parallel M.S.; [intro to Go](week2/P+C-Lec03-GoForkJoin.pdf)
• a [Go syntax](week2/go-syntax.html) primer
⇒ [**Homework 1**](hw1.html) some Go
**Lecture 4 and 5:** the fork-join model
**Lecture 6:** parallel prefix scan
• [Ladner & Fischer paper](https://dl.acm.org/doi/10.1145/322217.322232)
• **Reading:** CLR(1e) Chapter 29.2
**Lecture 7:** fork-join parallel scan ("Blelloch's scan")
• **Reading:** [Blelloch's parallel algorithm notes](https://www.cs.cmu.edu/~guyb/paralg/paralg/parallel.pdf) Chapter 4.1
⇒ [**Homework 2**](hw2.html) parallel scan
**Lecture 8:** applications of scan
• [Blelloch paper](https://people.eecs.berkeley.edu/~culler/cs262b/papers/scan89.pdf)
• [Blelloch chapter](https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf) from *Synthesis of Parallel Algorithms (ed. Reif)*
# Resources
We will be working from excerpts of texts, tutorials, and notes, as
well as research papers and articles. I will provide the readings,
there is no required textbook (as of yet), and I will work to link
those within the weekly schedule below as the semester progresses.
Below also is a list of classic resources on parallel algorithm
theory, on concurrency, and on parallel hardware that we've used in
a past version of this course.
**Introduction to Algorithms**
by *Cormen, Leiserson, Rivest (and Stein)*
• Selected chapters from various editions address parallel algorithms:
Chapter 27 (3e) on *Multithreaded Algorithms*
Chapter 27 (2e) or Chapter 28 (1e) on *Sorting Networks*
Chapter 29 (1e) on *Arithmetic Circuits*
Chapter 30 (1e) on *Algorithms for Parallel Computers*
**Synthesis of Parallel Algorithms** (1993)
edited by J. Reif.
• PRAM algorithm compendium. Several chapters are available on-line.
**Parallel Algorithms and Architectures: Arrays.Trees.Hypercubes** (1992)
by *F. T. Leighton.*
• Algorithms for specific hardware interconnect topologies.
**The Art of Multiprocessor Programming** (2e, 2021)
by *Herlihy, Shavit, Luchango, and Spear.*
• Concurrency mechanisms and concurrent data structures, scheduling.
**Computer Architecture: a Quantitative Approach** (5e, 2012)
by *Hennessey and Patterson.*
• Chapter 4 on *Data-Level Parallelism*, including GPU architectures.
• Chapter 5 on *Thread-Level Parallelism*, shared memory multi-processors.
**Limits to Parallel Computation** (1995)
by *Greenlaw, Hoover, and Ruzzo.*
• PRAM and circuit models, the NC complexity class, P-completeness.
**An Introduction to Parallel Programming** (2012)
by *P. Pacheco.*
• Introduces parallel programming using MPI, OpenMP, and POSIX threads.
# Hardware & Software
You will all have access to `patty.reed.edu`, a Ubuntu 20.04 linux machine with
several cores and a GPU. Specs:
• 32 AMD Ryzen "thread ripper" cores with 2 hyperthreads each.
• NVidia GeForce RTX 2080 SUPER, with 3072 CUDA cores (192 MTUs).
You can also configure your own machine to develop using any of
• POSIX threads for C,
• NVidia's CUDA,
• OpenMPI for C/C++, OpenMP for C/C++ (might need Intel's TBB),
• Go, Rust, Cilk, and other language systems.
# Responsibilities
The course is a hybrid of theory and (prototyped) implementations,
building off the skills you learned in MATH/CSCI 382 and CSCI 221. As
such, there will be a mix of written problem sets and coded programs.
There will also be a final, independent project assigned after the 8th
week. This will ask you to either carry a homework assignment
assignment further than just from pseudocode or from prototype,
perhaps trying several techniques to get good practical
performance. Alternatively, you can seek to attack a problem we do not
cover in class, survey it, and implement a working solution (or
solutions).
(##) Assignments
⇒ [**Homework 1**](hw1.html) some Go
⇒ [**Homework 2**](hw2.html) parallel scan