Next: For more information Up: Performance Analysis of Previous: Detailed Performance Analysis

Conclusion

The pC++ programming system includes an integrated set of performance instrumentation, measurement, and analysis tools. With this support, we have been able to validate performance scalability claims of the language and characterize important performance factors of the runtime system ports during pC++ system development. As a consequence, the first version of the compiler is being introduced with an extensive set of performance experiments already documented. Some of the performance analysis has been reported in this paper. From the scalability data, we see that pC++ already achieves good performance. From the detailed trace and profile data we are able to pinpoint those aspects in the language's use for algorithm design or in the implementation of runtime system operations where performance optimizations are possible. For instance, the profiler has shown that a great number of barrier synchronization are generated, causing a reduction in overall parallelism. One important compiler optimization will be to recognize when barriers can be removed or replaced with explicit synchronization. Other optimizations might be more architecture specific. As an example, in distributed memory systems it will be important to overlap communication with computation, whereas in shared memory environments, collection distribution and memory placement will be important for achieving good locality of reference. Again, performance analysis will be critical for identifying the need and resulting benefit of such optimizations.




Next: For more information Up: Performance Analysis of Previous: Detailed Performance Analysis


mohr@cs.uoregon.edu
Thu Feb 24 13:42:43 PST 1994