When storing or retrieving the piggyback value, we create an auto variable on the stack in our wrapper routines for MPI_Send or MPI_Recv. Synchronization operations involve loads or stores to this variable. The logic to process the piggyback value when it is received is incorporated in the MPI_Recv wrapper routine. Here, we compare the local and remote delays to arrive at how much adjustment needs to be made to the waiting time. Now let us examine the asynchronous MPI_Isend and MPI_Irecv calls. When the user issues the MPI_Isend call, we compute the local delay and create a global variable where this is stored. The location of this global piggyback variable in the heap memory is used when we create our struct for a new datatype for sending the message.
On the receiving side, a similar arrangement of the piggyback value is used. When the message is finally received, MPI automatically copies the contents of the piggyback value into the heap where this value is to be stored. We also create a map that links the address of the MPI request to the address of this piggyback value. The logic that compares the local and remote delays cannot be incorporated in the MPI_Irecv wrapper due to the very nature of the asynchronous operation (the values are not received when the routine executes). Hence, we do not adjust the time spent in MPI_Irecv as we did for MPI_Recv. Instead, an asynchronous message is visible to the program only after executing the MPI_Wait, MPI_Test, or variants of these calls (Waitall, Waitsome, Testall, Testsome) to wait for or test one or more requests. When a request is satisfied, we examine the map and retrieve the value of the piggyback variable where the remote process' overhead is stored. Then, a comparison of local and remote delays and an adjustment of waiting time is made on the receiving side. When more than one message is received by the process, we need to examine all the remote delays to determine how much time the process would have waited in the absence of instrumentation. We discuss this in more detail next with collective operations.