Skip to content

Async Responses Structured Outputs Memory leak #3084

@savvasp-123

Description

@savvasp-123

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

Using the AsyncResponses client to parse structured outputs there are pydantic models left behind which causes memory to accumulate. Flame graph of the memory leak:

Image

Memory leak in parse_response: ParsedResponse[TextFormatT] causes unbounded pydantic schema rebuilds

Environment

  • openai==2.31.0
  • pydantic>=2
  • Python 3.13

Description

openai/lib/_parsing/_responses.py calls construct_type_unchecked with TypeVar-subscripted generics:

construct_type_unchecked(type_=ParsedResponseOutputText[TextFormatT], ...)
construct_type_unchecked(type_=ParsedResponseOutputMessage[TextFormatT], ...)
construct_type_unchecked(type_=ParsedResponse[TextFormatT], ...)

TextFormatT is a free module-level TypeVar. Pydantic cannot resolve a free TypeVar, so model_rebuild(raise_errors=False) returns False on every call. This means MockCoreSchema._built_memo is never populated, and a new SchemaValidator/SchemaSerializer (heavy Rust objects) is allocated on every single responses.parse() call. These accumulate without bound.

Root Cause

Pydantic's MockCoreSchema._get_built() only caches its result (_built_memo) when model_rebuild succeeds (returns True). For ParsedResponse[TextFormatT] where TextFormatT is unresolved, model_rebuild always returns False, so the cache is never set and the rebuild runs on every access.

Fix

Use the non-parameterized base classes instead:

construct_type_unchecked(type_=ParsedResponseOutputText, ...)
construct_type_unchecked(type_=ParsedResponseOutputMessage, ...)
construct_type_unchecked(type_=ParsedResponse, ...)

This is functionally identical at runtime — ParsedResponse and related classes guard their parameterised fields behind if TYPE_CHECKING:, so the type argument has no runtime effect:

class ParsedResponse(Response, GenericModel, Generic[ContentType]):
    if TYPE_CHECKING:
        output: List[ParsedResponseOutputItem[ContentType]]  # never executed at runtime
    else:
        output: List[ParsedResponseOutputItem]

Impact

In a long-running server making repeated responses.parse() calls, RSS grows linearly with request count and the process will eventually OOM.

To Reproduce

  1. Use the Async Responses .parse() API with a Pydantic model
  2. The memory is not freed

OS

macOS

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions