-
Notifications
You must be signed in to change notification settings - Fork 8
Add integer-based access to MPI_Wtime #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have many objections:
|
Interesting idea. I wondered what accuracy the user can get via the double timer depending on the OS timer accuracy. For us timers (1e-6) it will take about 544 years before losing one us. Going for ns (1e-9) the result is more drastic, it will take 194 days before losing one ns. Thus, counting the time from the MPI job start as proposed by @jeffhammond, will give us 194 days before losing our first nano-second, and from there the accuracy will sharply decline. |
@jeffhammond I agree that counting the time from the MPI job start does take care of the accuracy issue. Regarding the other objections, I did not necessarily mean clock ticks of a varying frequency processor, but rather the abstract tick of a "clock on the wall", much like The proposal is not trying to provide a new time source, but rather a complementary interface to the existing time source. As you mention, interfaces such as |
@bosilca Well, I suppose if MPI fault-tolerance works as designed, people might actually attempt to run jobs for 6 months 😮 @mahermanns Indeed, @jdinan corrected my misunderstanding of "tick". Since |
@jeffhammond If you take a look at mpiwg-tools/tools-issues#11 and the corresponding branch https://github.com/mpiwg-tools/mpi-standard/tree/issue_11_mpi_t_events, the plan here is that instead of a tool querying The desire here was to use an integer-based timestamp (i.e., same type as current timing routines in the backend), as that might enable a lower overhead, when we don't need to convert the time into double (and potentially back into an integer, if the tool's timestamps are integer-based). To put the event time source into relation of the tool's other time sources, the tool would need to query reference timestamps at some point (e.g., beginning and end of the measurement). |
To correct an earlier comment by @jeffhammond --
In current Intel processors, the timestamp counter (accessed via the Regarding the suggestion that Linux |
Even though I work for Intel, I do not support making decisions about the MPI standard based upon the fact that Intel got this right starting in 2008. And now that I've seen exactly how much of a pain it is to implement |
@mahermanns Is "The number of ticks per second must be constant over the execution of the program" intended to specify that the number of ticks elapsed is monotonic increasing? If so, I'm having trouble convincing myself that this is sufficient. When you make adjustments to the clock to maintain A second question -- why tie this new routine to the resolution of |
@jdinan The phrase is to ensure that the call always reports the same ticks per second during the run, i.e., a tool can query that at the beginning of a run and does not have to query it again. Explicit mention of monotonic time is not part of this proposal, as I did not want to overload it (separation of concerns) ... of course, bad things happen to a number of tools when the time is not monotonically increasing and as a tools developer I would like a way to ensure this. I think we talked about this in Aachen and the idea was that the synchronization is only allowed to re-set the clocks to a future time. |
Could you clarify the problem solved by the proposed API? As @bosilca mentioned earlier, a good implementation of |
The API enables tools to obtain a low-overhead timestamps without needing the conversion to double and back again. At the moment, If tools want to use the MPI timer for events recording (as needed by MPI_T events), calls to the MPI internal timing will be more frequent. As current tools often use an integer-based timestamp internally, using the current interface to an "MPI time" would imply (1) MPI getting an integer-based time from an interface like |
Have you compared the overhead of these two routines? It's not obvious to me that the proposed routine will substantially reduce the overhead relative to |
@jdinan The SpiNNaker architecture does not have floating-point capability in hardware - it simulates floating-point operations in software. MPI_WTIME is therefore very expensive, and converting back from double to int64 is also very expensive. These new functions would be orders of magnitude cheaper to implement. There are efforts to implement MPI on this architecture, e.g. see: |
@dholmes-epcc-ed-ac-uk If the argument for the new routine is strictly lower overhead, I suggest that someone measure the difference. gcc [1] supports software floating point emulation, which could allow you to measure that scenario as well. [1] https://stackoverflow.com/questions/13201495/soft-float-on-x86-64 |
I wrote a small program to measure the difference between integer and floating point return values for several common Linux timing methods [1]. I measured a difference of 15-19 cycles, which amounted to a roughly 15-19% increase in overhead to return a floating point versus an integer result:
This was measured on an Intel(R) Xeon(R) CPU X5570 @ 2.93GHz, CentOS Linux release 7.3.1611, Intel MPI 2017.4.196, and compiled with gcc 4.8.5. [1] https://gist.github.com/jdinan/227d1777798155b99d0fa995b750247b |
I updated the gist so that all timers use nsec and we convert to double seconds to integer nsec during the timed portion (the use case used to motivate this ticket -- I missed this conversion in previous measurements):
This adds another ~5ns. There are a total of 6 FP operations (convert/scale sub-sec, convert sec, add sub-sec, scale/convert sec to nsec after wtime returns) that accounts for ~30ns total or roughly 5ns per operation. If you assume soft FP is 25x slower, that gives you an estimated 750ns + 80ns = 830ns to query time with soft FP (roughly one order of magnitude). |
As discussed on mpi-forum/mpi-issues#77 (comment) the conversion to double in the MPI_Wtime decrease the range and accuracy of the resulting timer. By setting the timer to 0 at the first usage we basically maintain the accuracy for 194 days even for gettimeofday. Signed-off-by: George Bosilca <[email protected]>
As discussed on mpi-forum/mpi-issues#77 (comment) the conversion to double in the MPI_Wtime decrease the range and accuracy of the resulting timer. By setting the timer to 0 at the first usage we basically maintain the accuracy for 194 days even for gettimeofday. Signed-off-by: George Bosilca <[email protected]>
As discussed on mpi-forum/mpi-issues#77 (comment) the conversion to double in the MPI_Wtime decrease the range and accuracy of the resulting timer. By setting the timer to 0 at the first usage we basically maintain the accuracy for 194 days even for gettimeofday. Signed-off-by: George Bosilca <[email protected]>
As discussed on mpi-forum/mpi-issues#77 (comment) the conversion to double in the MPI_Wtime decrease the range and accuracy of the resulting timer. By setting the timer to 0 at the first usage we basically maintain the accuracy for 194 days even for gettimeofday. Signed-off-by: George Bosilca <[email protected]>
As discussed on mpi-forum/mpi-issues#77 (comment) the conversion to double in the MPI_Wtime decrease the range and accuracy of the resulting timer. By setting the timer to 0 at the first usage we basically maintain the accuracy for 194 days even for gettimeofday. Signed-off-by: George Bosilca <[email protected]>
As discussed on mpi-forum/mpi-issues#77 (comment) the conversion to double in the MPI_Wtime decrease the range and accuracy of the resulting timer. By setting the timer to 0 at the first usage we basically maintain the accuracy for 194 days even for gettimeofday. Signed-off-by: George Bosilca <[email protected]>
Thanks everyone for the discussion on this. Also based on the feedback we got for the MPI_T events proposal, we now moved into the direction of having separate timing routines for MPI_T that are independent of MPI_Wtime and integrated those into the #79 directly. I therefore close this ticket. |
Problem
MPI provides standardized access to a time source through
MPI_Wtime()
, however, the returned timestamp is a floating-point number based on seconds since some time in the past. If that time in the past is significantly far in the past, the floating point value loses resolution. Furthermore, most common time sources are integer-based, thus the time information needs to be converted to a floating-point value with additional effort.Proposal
Provide two additional calls returning integer values for ticks since some time in the past and ticks per second. The time source should be the same as that for
MPI_Wtime
.Changes to the Text
See the corresponding pull request.
Impact on Implementations
Implementations need to support the additional two function calls.
Impact on Users
Users can access integer-based timing information, with potentially lower overhead, while still benefiting from the convenient floating-point interface in less time/overhead-critical parts of the code (e.g., printf, write).
References
Tools Ticket: mpiwg-tools/tools-issues#8
The text was updated successfully, but these errors were encountered: