Skip to content

Conversation

@eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Oct 16, 2025

In the PR we untrack frozen tuples for the normal constructors. There are a few methods shared between the set and frozenset (for example set_intersection in setobject.c) where we have not added the untracking. (this is possible, but I am not sure this is worthwhile to do).

Here is a small script to test the idea:

import gc
import time
from statistics import mean

number_of_iterations = 20
number_of_gc_iterations = 50

deltas = []

gc.disable()
gc.collect()
for kk in range(number_of_iterations):
    t0 = time.perf_counter()
    for jj in range(number_of_gc_iterations):
        gc.collect()
    dt = time.perf_counter() - t0
    deltas.append(dt)
print(f"time per collection: mean {1e3 * mean(deltas) / number_of_iterations:.3f} [ms], min {1e3 * min(deltas) / number_of_iterations:.3f} [ms]")

sets = [frozenset([ii]) for ii in range(10_000)]
deltas = []
print("---")
gc.disable()
gc.collect()
for kk in range(number_of_iterations):
    t0 = time.perf_counter()
    for jj in range(number_of_gc_iterations):
        gc.collect()
    dt = time.perf_counter() - t0
    deltas.append(dt)
print(f"time per collection: mean {1e3 * mean(deltas) / number_of_iterations:.3f} [ms], min {1e3 * min(deltas) / number_of_iterations:.3f} [ms]")

#%% Show statistics of frozen containers

gc.collect()

def candidate(obj):
    return all(not gc.is_tracked(x) for x in obj)

for immutable_type in (tuple, frozenset):
    number_of_objects_tracked = 0
    number_of_candidates = 0
    number_of_immutable_candidates = 0

    for obj in gc.get_objects():
        number_of_objects_tracked += 1
        if type(obj) is immutable_type:
            number_of_candidates += 1
            # print(f"{type(obj)} = {obj}")
            if candidate(obj):
                number_of_immutable_candidates += 1

    print(f"type {immutable_type}")
    print(f"  {number_of_objects_tracked=}")
    print(f"  {number_of_candidates=}")
    print(f"  {number_of_immutable_candidates=}")

It measures the performance of garbage collection, and outputs some statistics for the numbers of frozen containers.

Main:

time per collection: mean 1.311 [ms], min 1.301 [ms]
---
time per collection: mean 2.467 [ms], min 2.272 [ms]
type <class 'tuple'>
  number_of_objects_tracked=18330
  number_of_candidates=546
  number_of_immutable_candidates=1
type <class 'frozenset'>
  number_of_objects_tracked=18330
  number_of_candidates=10059
  number_of_immutable_candidates=10057

PR

time per collection: mean 1.285 [ms], min 1.251 [ms]
---
time per collection: mean 1.424 [ms], min 1.396 [ms]
type <class 'tuple'>
  number_of_objects_tracked=8273
  number_of_candidates=546
  number_of_immutable_candidates=6
type <class 'frozenset'>
  number_of_objects_tracked=8273
  number_of_candidates=2
  number_of_immutable_candidates=0

Note: generative ai was used in creating the PR

Co-authored-by: Mikhail Efimov <efimov.mikhail@gmail.com>
@sergey-miryanov
Copy link
Contributor

sergey-miryanov commented Oct 17, 2025

Maybe it is worth to change tp_alloc for something like:

PyObject *
PyFrozenSet_Alloc(PyTypeObject *type, Py_ssize_t nitems)
{
    PyObject *obj = PyType_GenericAlloc(type, nitems);
    if (obj == NULL) {
        return NULL;
    }

    _PyFrozenSet_MaybeUntrack(obj);
    return obj;
}

@eendebakpt
Copy link
Contributor Author

Maybe it is worth to change tp_alloc for something like:

The tp_alloc is used in make_new_set, which in turn is called by make_new_set. The last one is used set_intersection which modifies a frozenset. So adding _PyFrozenSet_MaybeUntrack to tp_alloc would mean we have to add a _PyFrozenSet_MaybeTrack to the end of set_intersection. This is a complication I do not want to tackle (certainly not in this PR).

Copy link
Contributor

@sergey-miryanov sergey-miryanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me.

@vstinner
Copy link
Member

Would it be possible to write tests in Python rather than in C?

@eendebakpt
Copy link
Contributor Author

Would it be possible to write tests in Python rather than in C?

I tried, but it is not easy. We have to expose PySet_Add (frozenset().add does not exist on the python side). I added pyset_add on the _testcapi module (with pyset_add just calling PySet_Add). But running this on a frozenset from the python side does not work: when calling _testcapi.pyset_add(frozen_set, item) there too many references to the frozen_set and PySet_Add will fail with an internal error here:

PyErr_BadInternalCall();

And when calling _testcapi.pyset_add(frozenset(), item) we do not have the frozenset available to test whether tracking has been enabled.

@sergey-miryanov
Copy link
Contributor

And when calling _testcapi.pyset_add(frozenset(), item) we do not have the frozenset available to test whether tracking has been enabled.

IIUC, if you return the first argument from pyset_add then you can test it on the python side.

@eendebakpt
Copy link
Contributor Author

And when calling _testcapi.pyset_add(frozenset(), item) we do not have the frozenset available to test whether tracking has been enabled.

IIUC, if you return the first argument from pyset_add then you can test it on the python side.

Ok, I gave it another try. The first attempt failed, but by using the vectorcall convention I can keep the reference count at 1 also from the Python side.

Co-authored-by: Victor Stinner <vstinner@python.org>
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. With one last comment :-)

@eendebakpt
Copy link
Contributor Author

LGTM. With one last comment :-)

Nice. Give me one second to double check something.

Copy link
Member

@efimov-mikhail efimov-mikhail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eendebakpt
Copy link
Contributor Author

LGTM. With one last comment :-)

Nice. Give me one second to double check something.

The _PyFrozenSet_MaybeUntrack is only used in two places and in one of then the argument is guaranteed to be an exact frozenset. I checked whether we can simplify code by changing the if (!PyFrozenSet_CheckExact(op)) ... into an assert. It can be done, but then we have to add other check in make_new_frozenset so overall it is not clear improvement.

PR is ready from my side!

Copy link
Contributor

@sergey-miryanov sergey-miryanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nitpicks. Otherwise LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants