We see that in both cases thread 0 blocks waiting for the iterates (or work packets) to be processed. This work is done on thread 1. The figure on the top illustrates the case without mappings. When mappings are used (as in the bottom figure), the synchronous execution of individual POOMA-2 statements is tracked. Here we see that negligible time is spent in constructing the work packets for the statements.