Skip to content

fix: avoid copy constructor instantiation in shared_ptr fallback cast#6028

Merged
rwgk merged 6 commits intopybind:masterfrom
virtuald:fix-pytorch-shared-ptr-reference-internal
Apr 6, 2026
Merged

fix: avoid copy constructor instantiation in shared_ptr fallback cast#6028
rwgk merged 6 commits intopybind:masterfrom
virtuald:fix-pytorch-shared-ptr-reference-internal

Conversation

@virtuald
Copy link
Copy Markdown
Contributor

@virtuald virtuald commented Apr 5, 2026

Description

Original bug at https://github.com/pytorch/pytorch/actions/runs/24007816974/job/70014373231.

I haven't manually validated this fix yet, it might be all slop. I don't think the tests are really in the right shape to test/validate the problem... I'll take a closer look tonight, but for now Here's GPT-5.4's explanation:


Problem and fix

Summary

A regression in pybind11's std::shared_ptr<T> caster caused the reference_internal fallback path to instantiate copy/move constructor helpers even when no copy or move was actually needed.

This broke downstream projects such as PyTorch when building with Clang warning settings like:

  • -Werror=deprecated-copy-with-user-provided-dtor
  • -Werror=deprecated-copy-with-dtor

What regressed

In include/pybind11/cast.h, the fallback path for returning std::shared_ptr<T> to Python when T is bound with neither std::shared_ptr holder nor py::smart_holder was changed to:

if (parent) {
    return type_caster_base<type>::cast(
        srcs, return_value_policy::reference_internal, parent);
}

That looks reasonable, but type_caster_base<type>::cast(...) forwards through code that eagerly instantiates:

  • make_copy_constructor(...)
  • make_move_constructor(...)

via type_caster_base::cast(const cast_sources &, ...).

Even though the reference_internal path does not use those constructors at runtime, their instantiation is enough for Clang to diagnose implicit copy construction for types that have a user-declared destructor.

Downstream impact

PyTorch hit this in CI after updating pybind11. The failing patterns came directly from code such as:

  • torch/csrc/jit/python/python_tracer.cpp
    • m.def("_get_tracing_state", []() { return getTracingState(); });
  • torch/csrc/jit/python/script_init.cpp
    • methods returning std::shared_ptr<InterfaceType> from CompilationUnit

The relevant types include user-declared destructors, e.g.:

  • TracingState::~TracingState();
  • InterfaceType::~InterfaceType() override = default;

Clang then reports errors originating from pybind11/detail/type_caster_base.h:1682, where the copy-constructor helper lambda is instantiated.

Root cause

The bug is not in lifetime handling itself. The lifetime model is correct: for this path, pybind11 should create a Python object referring to the existing C++ object and tie its lifetime to parent using reference_internal.

The regression is that the implementation routed through a helper that always prepares copy/move constructor callbacks, even though this fallback path is strictly non-owning and does not need them.

Fix

Use type_caster_generic::cast(...) directly for the parent/reference_internal fallback and pass null copy/move constructor callbacks:

if (parent) {
    return type_caster_generic::cast(
        srcs, return_value_policy::reference_internal, parent, nullptr, nullptr);
}

This preserves the intended behavior:

  • no ownership transfer
  • reference_internal keep-alive behavior remains intact
  • no unnecessary copy/move constructor instantiation

Tests added

Two kinds of tests were added:

1. Existing smart-holder property aliasing coverage

In tests/test_class_sh_property.py, the original aliasing test was kept and a stronger multi-read version was added:

  • verifies repeated property reads alias the same underlying member
  • ensures updates through one Python handle are visible through another

2. Direct downstream-style PyTorch regression reproducer

New files:

  • tests/pybind11_pytorch_regressions.cpp
  • tests/test_pytorch_downstream_regressions.py

These extract the failing PyTorch patterns into a minimal pybind11 test module:

  • a TracingState-like type returned from a free function as std::shared_ptr<T>
  • an InterfaceType-like object returned from a CompilationUnit method as std::shared_ptr<T>

For the dedicated test target, Clang is given:

  • -Werror=deprecated-copy-with-user-provided-dtor
  • -Werror=deprecated-copy-with-dtor

This ensures the regression is caught at compile time.

Suggested changelog entry:

  • Fix std::shared_ptr fallback casting to avoid unnecessary copy-constructor instantiation in reference_internal paths

@virtuald virtuald force-pushed the fix-pytorch-shared-ptr-reference-internal branch from f7b3345 to ac840cf Compare April 5, 2026 20:40
Copy link
Copy Markdown
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reproducer! I can confirm that it reproduces the compilation error on my workstation (Ubuntu clang version 18.1.3 (1ubuntu1)).

Looks good to me. My suggestions are just to make things more intuitive and easier to discover.

virtuald and others added 3 commits April 6, 2026 00:30
Name the non-owning generic cast path so callers do not have to rediscover that
reference-like policies must pass null copy/move constructor callbacks. This
keeps the shared_ptr reference_internal fallback self-documenting and points
future maintainers toward the safe API.

Made-with: Cursor
@virtuald virtuald marked this pull request as ready for review April 6, 2026 04:40
@virtuald virtuald requested a review from henryiii as a code owner April 6, 2026 04:40
@virtuald
Copy link
Copy Markdown
Contributor Author

virtuald commented Apr 6, 2026

If you're happy then I'm happy too :shipit:

Use __has_warning for the Clang-only regression test so older compiler jobs skip
unsupported warning groups instead of failing with -Wunknown-warning-option. A
simple __clang_major__ >= 13 guard would be shorter, but it bakes in a version
cutoff; __has_warning is slightly more verbose while being more robust to
vendor builds, backports, and future packaging differences.

Made-with: Cursor
@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 6, 2026

I'm happy, but older clang versions need some more convincing: commit 87bf519

For easy reference: clang 5 and 11 failures observed here: https://github.com/pybind/pybind11/actions/runs/24018953764/job/70043719610?pr=6028

@rwgk rwgk merged commit 98e50b7 into pybind:master Apr 6, 2026
89 checks passed
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs changelog Possibly needs a changelog entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants