Inf-2202 Concurrent and System Level Programming – Fall 2013

Administrative information

Administrative course information is available here

There will be created a mailing list for the course using your email addresses in Fronter.

There is not a textbook for the course. See below for mandatory readings.

Staff

·         Lars Ailo Bongo (larsab@cs.uit.no), Office: A259

·         Ibrahim Umar (ibrahim.umar@uit.no), Office: A135

Lecture plan

All lectures are either on Tuesdays, 14:15-16:00 (MH U6) or Fridays, 11:15-12:00 (Store Aud).

Lecture

Date

Subject

Lecturer

L1

Fri 16.08

Introduction

Lars Ailo

L2

Tue 20.08

Threads and synchronization primitives

Lars Ailo

L3

Tue 20.08

Parallel architectures

Lars Ailo

V1

Fri 30.08

Rob Pike - Concurrency is not parallelism

video

L4

Tue 27.08

Parallel programs

Lars Ailo

L5

Tue 03.09

Programming for performance

Lars Ailo

L6

Tue 10.09

Guest lecture: Go

Giacomo Tartari

L7

Tue 27.09

Performance evaluation

Lars Ailo

L8

Tue 24.09

Parallel program performance evaluation

Lars Ailo

L9

Tue 01.10

Guest lecture: Stallo

Jonas Juselius

L10

Tue 08.10

Data-intensive computing

Lars Ailo

Tue 15.10

No lecture

L11

Tue 22.10

Google File System, MapReduce, and Hadoop

Lars Ailo

L12

Tue 29.10

BigTable , Hbase, and Spark (no slides)

Lars Ailo

L13

Tue 5.11

Event-based programming

Lars Ailo

L13

Tue 12.11

Summary (1st hour)

Lars Ailo

 

 

Course evaluation (2nd hour)

Jan Fuglesteg

 

Tue 19.11

No lecture

 

 

Fri 22.11

Exam

 

Readings

All are mandatory unless otherwise noticed.

1.       Introduction

a.       None

2.       Threads and synchronization primitives

a.       Modern Operating Systems, 3ed. Andrew S. Tanenbaum. Prentice Hall. 2007. Chapters 2.2, 2.3, 2.5, 10.3, 11.4

b.       Alternatively: the chapters about threading, IPC mechanisms, and classical IPC problems from another operating systems textbook.

c.       Additional reading: Futexes Are Tircky

3.       Parallel architectures

a.       Computer Organization and Design: the Hardware/Software Interface, 4ed. David A. Patterson, John L. Hennessy. Morgan Kaufmann. 2011. Chapter 8: “Multicores, Multiprocessors, and Clusters”.

4.       Parallel programs

a.       Chapter 1.3 and 2.0—2.3.5 from Parallel Computer Architecture: A Hardware/Software Approach, J.P. Singh, Anoop Gupta. Morgan Kaufmann. 1998.

5.       Programming for performance

a.       Chapter 3 from Parallel Computer Architecture: A Hardware/Software Approach, J.P. Singh, Anoop Gupta. Morgan Kaufmann. 1998.

6.       Go

a.       Rob Pike. SPLASH keynote talk

b.       A tour of Go

c.       How to write Go code

d.       Effective Go

e.       Go concurrency patterns (video, slides)

7.       Performance evaluation

a.       Jayadev Misra. 1986. Distributed discrete-event simulation. ACM Comput. Surv. 18, 1 (March 1986), 39-65.

b.       J. Liedtke. 1995. On micro-kernel construction. In Proceedings of the fifteenth ACM symposium on Operating systems principles (SOSP '95),.

c.       Additional reading: Yanpei Chen, Kiran Srinivasan, Garth Goodson, and Randy Katz. 2011. Design implications for enterprise storage systems via multi-dimensional trace analysis. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11).

d.       Additional reading: Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt. 2009. Debugging in the (very) large: ten years of implementation and experience. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles (SOSP '09).

8.       Parallel program performance evaluation

a.       Amdahl’s law

b.       Gustafson’s law

c.       Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh and Kai Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, October 2008.

9.       Stallo

a.       None

10.   Data-intensive computing

a.       Introduction by Jim Gray (on eScience), and Gray’s law (pp. 5—11) from The Fourth Paradigm: Data-Intensive Scientific Discovery. Edited by Tony Hey, Stewart Tansley, and Kristin Tolle.

b.       On the Future of Genomic Data. Scott D. Kahn. Science 11 February 2011: 331 (6018), 728-729.

c.       Data, data everywhere. The Economist Feb 25th, 2010.

d.       Three Paradoxes of Big Data. Neil M. Richards and Jonathan H. King, Stanford Law Review Online, 2013.

e.       Additional reading: rest of The Fourth Paradigm: Data-Intensive Scientific Discovery.

f.        Additional reading: rest of Science special issue on Big Data.

g.       Additional reading: rest of The Economist special report on the data deluge

11.   MapReduce & BigTable

a.       MapReduce tutorial

b.       Pydoop tutorial

c.       Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the nineteenth ACM symposium on Operating systems principles (SOSP '03). ACM, New York, NY, USA, 29-43.

d.       Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (January 2008), 107-113.

12.   Data analytics

a.       Hbase tutorial (or another Hbase tutorial)

b.       Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 26, 2, Article 4 (June 2008), 26 pages.

c.       Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica. Spark: Cluster Computing with Working Sets. HotCloud 2010. June 2010.

13.   Summary

a.       None

Suggested reading

·         Parallel Computer Architecture: A Hardware/Software Approach. David Culler, J.P. Singh, Anoop Gupta. Morgan Kaufmann. 1998.

o   This book has a great introduction to parallel programming.

o   There is one copy in the library. Please be nice to your fellow students and do not lend that copy for an extended period of time.

·         The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. R. K. Jain. Wiley. 1991.

o   A very good book about performance analysis.

o   There is one copy in the library. Please be nice to your fellow students and do not lend that copy for an extended period of time.

·         Computer Architecture, Fifth Edition: A Quantitative Approach, 5ed. John L. Hennessy, David A. Patterson. Morgan Kaufmann. 2011.

o   This book has a throughout description of different parallel architectures.

o   The book can be purchased from your favorite bookstore.

·         The Fourth Paradigm: Data-Intensive Scientific Discovery. Edited by Tony Hey, Stewart Tansley, and Kristin Tolle. 2010.

o   This collection of essays describe many of the opportunities and challenges for data-intensive computing in different scientific fields.

o   The book is freely available as an ebook.

·         Hadoop: The Definitive Guide, 3ed. Tom White. O’Reilly. 2012.

o   Nice overview of the Hadoop ecosystem, included detailed description of HDFS and hadoop MapReduce.

o   This book is available in the library as a Safari ebook.

·         HBase: The Definitive Guide. Lars George. O’Reilly. 2012.

o   Detailed description of HBase included tips for tuning the system.

o   This book is available in the library as a Safari ebook.

Useful resources

·         ACM parallel computing techpack

·         Intel Academic Resources especially Introduction to Parallel Programming for Shared Memory Parallelism (Intel)

·         Pthreads tutorial

·         Python threads and Python multiprocessing

·         The Go programming language, Go Wiki, Go blog, Go examples, and Go forum

·         Communicating Sequential Processes by C. A. R. Hoare, CACM vol. 21 (8), 1978. (paper describes concepts used by Go)

·         Parallel programming patterns

Exercises

1.       Introduction

a.       None

2.       Threads and synchronization primitives

a.       Compare the overhead of forking a process vs. creating a Pthread

b.       Compare the overhead of forking a process vs. creating a Python thread

c.       Implement a solution the following classical IPC problems using pthreads/Python threads and semaphores/condition variables. Note that you also need to generate a use case, test data, and useful output:

                                                               i.      Producer/ consumer

                                                             ii.      Reader/ writer

                                                           iii.      Sleeping barber

                                                           iv.      Dining philosophers

d.       Modify the code in c) to use message passing.

3.       Parallel architectures

a.       None

4.       Parallel programs

a.       Implement the kernel in chapter 2.3 with shared memory orchestration using Python threads. You also need to generate a use case, test data, and some way to see the result (tip: if you use a heatmap visualization you can visually see the changes in the output as the computation proceeds).

5.       Programming for performance

a.       Implement a tuple space in Python with semantics similar to Linda. Use your tuple space to implement a parallel version of Mandelbrot that uses dynamic assignment (pool of tasks).

6.       Go

a.       Implement the classical IPC problems in exercise 2.c. in Go.

7.       Performance evaluation

a.       Find two systems research papers. Study the methodology. Do they follow the recommended steps for measurements? Do they avoid all common mistakes? How have they monitored the system? Did they use simulation? Did they presented an analytical model?

8.       Parallel program performance evaluation

a.       Go through PARSEC paper and find out how they did each of the “Steps for a performance evaluation study”.

9.       Stallo guest lecture

a.       None

10.   Data-intensive computing

a.       Read a few of the additional readings articles.

11.   GFS/HDFS and MapReduce

a.       Implement the programs in the MapReduce tutorial

b.       Implement the programs in the Pydoop tutorial

12.   BigTable/HBase:

a.       Implement the programs in the Hbase tutorial

13.   Summary

a.       Sample exam

Mandatory assignments

All group/labs are on Thursdays, 12:15-14:00, MH U6 or lg-lab.

Project

Date

Subject

Lecturer

P1

Thu 22.08

Mandatory assignment 1

Ibrahim

 

Thu 12.09

P1 due

 

P2

Thu 20.09

Mandatory assignment 2

Lars Ailo

 

Thu 24.10

P2 due

 

P3

Thu 24.10

Mandatory assignment 3

Lars Ailo

 

Tue 05.11

P3 due

 

 

Exam

The exam is on Friday November 22nd, 09.00-13.00, Åsgårdvegen 9. More details are below.

Time and place for written exams - autumn semester 2013

 

Overview of the time and place for the written examinations in the autumn of 2012 can be found here:

http://uit.no/Content/105768/oppslag%2Cuts%2Ckont.%2CH%C3%98ST-13.pdf  

 

The schedule for the written exams in computer science can be found on page 6 in the document.

 

NB! Check carefully the time and venue for the specific examination. Remember, you must identify yourself at the exam. Bring valid ID.

 

Show up early, to sit in place no later than 15 minutes before the exam starts.

 

Rules during examination (written exams) ca be found here: http://en.uit.no/utdanning/art?dim=179018&p_document_id=347763