-
Notifications
You must be signed in to change notification settings - Fork 690
[Core] Report CUDA versions when NVRTC compilation fails #2842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
fc9ac02
1984669
9e73d48
4af8ac5
2f0dae3
ed4eaed
2b90938
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,6 +9,7 @@ | |
| #include <cublasLt.h> | ||
|
|
||
| #include <filesystem> | ||
| #include <fstream> | ||
| #include <mutex> | ||
|
|
||
| #include "../common.h" | ||
|
|
@@ -202,6 +203,49 @@ const std::string &include_directory(bool required) { | |
| return path; | ||
| } | ||
|
|
||
| int include_directory_version(bool required) { | ||
| // Header path | ||
| const auto &include_dir = cuda::include_directory(false); | ||
| if (include_dir.empty()) { | ||
| if (required) { | ||
| NVTE_ERROR( | ||
| "Could not detect version of CUDA Toolkit headers " | ||
| "(CUDA Toolkit headers not found)."); | ||
| } | ||
| return -1; | ||
| } | ||
|
|
||
| // Parse CUDART_VERSION from cuda_runtime_api.h. | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't really like how we parse the header as a text file. However, when I tried compiling a test program with NVRTC it would override the header's |
||
| const auto header_path = std::filesystem::path(include_dir) / "cuda_runtime_api.h"; | ||
| std::ifstream header_file(header_path); | ||
| if (header_file.is_open()) { | ||
| const std::string define_prefix = "#define CUDART_VERSION "; | ||
| std::string line; | ||
| while (std::getline(header_file, line)) { | ||
| const auto pos = line.find(define_prefix); | ||
| if (pos == std::string::npos) { | ||
| continue; | ||
| } | ||
| try { | ||
| const int version = std::stoi(line.substr(pos + define_prefix.size())); | ||
| if (version > 0) { | ||
| return version; | ||
| } | ||
| } catch (...) { | ||
| continue; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if (required) { | ||
| NVTE_ERROR( | ||
| "Could not detect version of CUDA Toolkit headers " | ||
| "(Could not parse CUDART_VERSION from ", | ||
| header_path.string(), ")."); | ||
| } | ||
| return -1; | ||
| } | ||
|
Comment on lines
+206
to
+247
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
int include_directory_version(bool required) {
static int cached = [&]() -> int {
// ... existing parsing logic ...
}();
// handle `required` separately for the error message
...
}Alternatively, add a |
||
|
|
||
| int cudart_version() { | ||
| auto get_version = []() -> int { | ||
| int version; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the explicit list of
transformer_engine::cuda::sm_count*,sm_arch*, etc. to the blankettransformer_engine::cuda::*will export every symbol in thecudanamespace — including any future internal helpers — as part of the library's public ABI. This may be intentional to avoid updating the linker script whenever a new function is added, but it is worth confirming the trade-off, since once a symbol is exported it must be maintained for ABI compatibility.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we shouldn't do that. Also, why do we need to modify this file in this PR at all? The newly added function is not part of the public API anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right — the new
include_directory_version()function is only used internally (inrtc.cppfor error reporting and in the test suite). Since the tests link against the same library, the symbol doesn't need to be exported in the version script.The wildcard change appears to be unnecessary. You can revert the modifications to
libtransformer_engine.versionentirely — the explicit symbol list should work as-is without any changes for this PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to expose the utility functions if we want to test them individually in the C++ unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, this actually shows a separate issue (that we should probably tackle in a separate PR and merge this one without the changes to the build and without the unit test) that the C++ tests as they are currently do not have the visibility into the internal functions. On one hand that is good, since it makes it possible to test that the API surface is correct and that we can do everything we want by just calling the right exposed functions, but on the other it makes it impossible to have targeted tests for those internal implementations without exposing them to the world. I think we should rethink this relationship to have either one or the other as the target of the C++ unit tests (and I would argue that the framework level integration already tackles the API exposure testing so we could make the C++ tests more coupled with the internals of the library itself).