Skip to content

[Core] Report CUDA versions when NVRTC compilation fails#2842

Open
timmoon10 wants to merge 7 commits intoNVIDIA:mainfrom
timmoon10:tmoon/nvrtc-version-check
Open

[Core] Report CUDA versions when NVRTC compilation fails#2842
timmoon10 wants to merge 7 commits intoNVIDIA:mainfrom
timmoon10:tmoon/nvrtc-version-check

Conversation

@timmoon10
Copy link
Copy Markdown
Collaborator

Description

NVRTC compilation involves three CUDA versions:

  • Compile-time CUDA: used to compile Transformer Engine
  • CUDA Runtime: linked during runtime and visible to libnvrtc.so
  • Run-time CUDA headers: included in run-time NVRTC compilations

If the user's system is misconfigured, these CUDA versions may be inconsistent and cause strange errors (e.g. #1018). This PR reports each of the CUDA versions to help with debugging.

Closes #1018.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • Report CUDA versions whe NVRTC compilation fails

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

timmoon10 and others added 4 commits April 4, 2026 01:54
When NVRTC kernel compilation fails, detect whether the linked NVRTC
library and the CUDA headers used for compilation are from different
CUDA versions, and if so emit an actionable note to stderr pointing
the user toward NVTE_CUDA_INCLUDE_DIR / CUDA_HOME / LD_LIBRARY_PATH.

The header version is obtained by compiling a tiny probe program that
embeds CUDA_VERSION (from cuda.h) into a static_assert failure message,
so the macro is resolved by the actual preprocessor rather than by
parsing header text.  All probe failures are silent; the check is
purely informational and never causes a premature error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Still buggy, include_directory_version returns CUDA runtime version instead of header version.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
The NVRTC probe approach was broken: NVRTC pre-defines CUDART_VERSION
to its own version before processing any includes, so the probe always
returned the NVRTC version regardless of the headers on the include path.

Fix by reading cuda_runtime_api.h as text and parsing the
"#define CUDART_VERSION <integer>" line directly. This is immune to
NVRTC's internal macro management, and the format has been stable across
all CUDA versions.

Also decode raw CUDA version integers to "major.minor" strings in the
error message for readability.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Test that the CUDA include directory is found and that its version
matches the compile-time CUDART_VERSION.

Also export transformer_engine::cuda::* symbols and tighten the rtc
export pattern in the version script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10 timmoon10 requested a review from Oleg-Goncharov April 7, 2026 00:48
@timmoon10 timmoon10 added the enhancement New feature or request label Apr 7, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 7, 2026

Greptile Summary

This PR adds CUDA version diagnostics to NVRTC compilation failures, reporting compile-time CUDA (CUDA_VERSION), NVRTC runtime (nvrtcVersion()), and runtime header versions (include_directory_version()) to help debug misconfigured environments (e.g. issue #1018). A new include_directory_version() function parses CUDART_VERSION from cuda_runtime_api.h at runtime, and the ABI linker script is updated to export the new symbol for unit testing.

Confidence Score: 5/5

This PR is safe to merge — all prior review concerns have been addressed or accepted, and no new P0/P1 issues were found.

All substantive issues (mismatch warning with empty path when headers not found, lack of caching in include_directory_version, and wildcard ABI export) were identified and resolved in prior review threads. The remaining observations are purely P2 quality suggestions. The implementation is logically correct: CUDA version encoding (major1000+minor10) is consistent across all three version sources, the version_string lambda correctly handles negative values with '', and the new unit test provides good coverage.

No files require special attention; the libtransformer_engine.version wildcard change was explicitly approved by a senior developer.

Important Files Changed

Filename Overview
transformer_engine/common/util/rtc.cpp Logs compile-time CUDA, NVRTC runtime, and header versions on NVRTC compilation failure; emits mismatch warning when NVRTC version ≠ header version
transformer_engine/common/util/cuda_runtime.cpp Adds include_directory_version() that parses CUDART_VERSION from cuda_runtime_api.h; not cached unlike sibling functions cudart_version()/cublas_version()
transformer_engine/common/util/cuda_runtime.h Declares include_directory_version() with doc-comment matching sibling function style; clean addition
transformer_engine/common/libtransformer_engine.version Widens cuda namespace export to transformer_engine::cuda::* wildcard; fixes rtc pattern to transformer_engine::rtc::*; intentional to expose utility functions for unit testing
tests/cpp/util/test_nvrtc.cpp Adds CUDAHeaders test verifying include_directory() is non-empty and include_directory_version() == CUDART_VERSION

Sequence Diagram

sequenceDiagram
    participant C as compile()
    participant NVRTC as nvrtcCompileProgram
    participant IDV as include_directory_version()
    participant FS as cuda_runtime_api.h

    C->>NVRTC: nvrtcCompileProgram(...)
    NVRTC-->>C: NVRTC_ERROR_COMPILATION
    C->>C: read CUDA_VERSION (compile-time macro)
    C->>NVRTC: nvrtcVersion(&major, &minor)
    NVRTC-->>C: major, minor → nvrtc_version
    C->>IDV: cuda::include_directory_version()
    IDV->>IDV: cuda::include_directory(false)
    IDV->>FS: open & parse #define CUDART_VERSION
    FS-->>IDV: version int
    IDV-->>C: header_version
    C->>C: log all 3 versions via version_string()
    C->>C: if nvrtc_version != header_version → emit warning
    C->>NVRTC: nvrtcGetProgramLogSize / nvrtcGetProgramLog
    NVRTC-->>C: compilation log
    C->>C: std::cerr << full log
    C->>C: NVTE_CHECK_NVRTC(compile_result) → throw
Loading

Greploops — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.
Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.

Reviews (2): Last reviewed commit: "Tweak version message" | Re-trigger Greptile

Comment on lines +206 to +247
int include_directory_version(bool required) {
// Header path
const auto &include_dir = cuda::include_directory(false);
if (include_dir.empty()) {
if (required) {
NVTE_ERROR(
"Could not detect version of CUDA Toolkit headers "
"(CUDA Toolkit headers not found).");
}
return -1;
}

// Parse CUDART_VERSION from cuda_runtime_api.h.
const auto header_path = std::filesystem::path(include_dir) / "cuda_runtime_api.h";
std::ifstream header_file(header_path);
if (header_file.is_open()) {
const std::string define_prefix = "#define CUDART_VERSION ";
std::string line;
while (std::getline(header_file, line)) {
const auto pos = line.find(define_prefix);
if (pos == std::string::npos) {
continue;
}
try {
const int version = std::stoi(line.substr(pos + define_prefix.size()));
if (version > 0) {
return version;
}
} catch (...) {
continue;
}
}
}

if (required) {
NVTE_ERROR(
"Could not detect version of CUDA Toolkit headers "
"(Could not parse CUDART_VERSION from ",
header_path.string(), ").");
}
return -1;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Result not cached, unlike sibling functions

include_directory_version() re-opens and re-parses cuda_runtime_api.h on every call. The sibling functions cudart_version() and cublas_version() in the same file both cache their result with a static local variable. While the function is currently only exercised on the failure path in rtc.cpp, caching the result would be consistent with the established pattern and avoids repeated I/O if the function is ever called more broadly (e.g. from tests):

int include_directory_version(bool required) {
  static int cached = [&]() -> int {
    // ... existing parsing logic ...
  }();
  // handle `required` separately for the error message
  ...
}

Alternatively, add a static int cached_version = -1; if (cached_version >= 0) return cached_version; guard at the top.

transformer_engine::cuda::supports_multicast*;
transformer_engine::cuda::stream_priority_range*;
transformer_engine::cuda::current_device*;
transformer_engine::cuda::*;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Wildcard export broadens ABI surface

Changing the explicit list of transformer_engine::cuda::sm_count*, sm_arch*, etc. to the blanket transformer_engine::cuda::* will export every symbol in the cuda namespace — including any future internal helpers — as part of the library's public ABI. This may be intentional to avoid updating the linker script whenever a new function is added, but it is worth confirming the trade-off, since once a symbol is exported it must be maintained for ABI compatibility.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we shouldn't do that. Also, why do we need to modify this file in this PR at all? The newly added function is not part of the public API anyway.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — the new include_directory_version() function is only used internally (in rtc.cpp for error reporting and in the test suite). Since the tests link against the same library, the symbol doesn't need to be exported in the version script.

The wildcard change appears to be unnecessary. You can revert the modifications to libtransformer_engine.version entirely — the explicit symbol list should work as-is without any changes for this PR.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to expose the utility functions if we want to test them individually in the C++ unit tests.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this actually shows a separate issue (that we should probably tackle in a separate PR and merge this one without the changes to the build and without the unit test) that the C++ tests as they are currently do not have the visibility into the internal functions. On one hand that is good, since it makes it possible to test that the API surface is correct and that we can do everything we want by just calling the right exposed functions, but on the other it makes it impossible to have targeted tests for those internal implementations without exposing them to the world. I think we should rethink this relationship to have either one or the other as the target of the C++ unit tests (and I would argue that the framework level integration already tackles the API exposure testing so we could make the C++ tests more coupled with the internals of the library itself).

return -1;
}

// Parse CUDART_VERSION from cuda_runtime_api.h.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like how we parse the header as a text file. However, when I tried compiling a test program with NVRTC it would override the header's CUDART_VERSION macro with the CUDA Runtime version.

@timmoon10
Copy link
Copy Markdown
Collaborator Author

/te-ci

Suggestion from @ptrendx

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error in NVRTC compilation

2 participants