Overhead compensation for parallel profiling requires transmitting
delay information with messages. Doing so undoubtedly introduces more
overhead in the process, in apparent contradiction to our goals. Our
methods do not adequately account for these overheads, nor is it obvious
exactly how they can or should. While the approach described attempts to
balance portability and efficiency concerns, its overhead in practice will
depend on what the underlying MPI implementation does with datatypes, and
it might do different things with different network interfaces. If the
technique is deployed in production environments, it will be important to
evaluate MPI implementations to determine their overhead effects.