Confirm this is an issue with the Python library and not an underlying OpenAI API
Describe the bug
Using the AsyncResponses client to parse structured outputs there are pydantic models left behind which causes memory to accumulate. Flame graph of the memory leak:
Memory leak in parse_response: ParsedResponse[TextFormatT] causes unbounded pydantic schema rebuilds
Environment
openai==2.31.0
pydantic>=2
- Python 3.13
Description
openai/lib/_parsing/_responses.py calls construct_type_unchecked with TypeVar-subscripted generics:
construct_type_unchecked(type_=ParsedResponseOutputText[TextFormatT], ...)
construct_type_unchecked(type_=ParsedResponseOutputMessage[TextFormatT], ...)
construct_type_unchecked(type_=ParsedResponse[TextFormatT], ...)
TextFormatT is a free module-level TypeVar. Pydantic cannot resolve a free TypeVar, so model_rebuild(raise_errors=False) returns False on every call. This means MockCoreSchema._built_memo is never populated, and a new SchemaValidator/SchemaSerializer (heavy Rust objects) is allocated on every single responses.parse() call. These accumulate without bound.
Root Cause
Pydantic's MockCoreSchema._get_built() only caches its result (_built_memo) when model_rebuild succeeds (returns True). For ParsedResponse[TextFormatT] where TextFormatT is unresolved, model_rebuild always returns False, so the cache is never set and the rebuild runs on every access.
Fix
Use the non-parameterized base classes instead:
construct_type_unchecked(type_=ParsedResponseOutputText, ...)
construct_type_unchecked(type_=ParsedResponseOutputMessage, ...)
construct_type_unchecked(type_=ParsedResponse, ...)
This is functionally identical at runtime — ParsedResponse and related classes guard their parameterised fields behind if TYPE_CHECKING:, so the type argument has no runtime effect:
class ParsedResponse(Response, GenericModel, Generic[ContentType]):
if TYPE_CHECKING:
output: List[ParsedResponseOutputItem[ContentType]] # never executed at runtime
else:
output: List[ParsedResponseOutputItem]
Impact
In a long-running server making repeated responses.parse() calls, RSS grows linearly with request count and the process will eventually OOM.
To Reproduce
- Use the Async Responses
.parse() API with a Pydantic model
- The memory is not freed
OS
macOS
Confirm this is an issue with the Python library and not an underlying OpenAI API
Describe the bug
Using the
AsyncResponsesclient to parse structured outputs there are pydantic models left behind which causes memory to accumulate. Flame graph of the memory leak:Memory leak in
parse_response:ParsedResponse[TextFormatT]causes unbounded pydantic schema rebuildsEnvironment
openai==2.31.0pydantic>=2Description
openai/lib/_parsing/_responses.pycallsconstruct_type_uncheckedwith TypeVar-subscripted generics:TextFormatTis a free module-levelTypeVar. Pydantic cannot resolve a free TypeVar, somodel_rebuild(raise_errors=False)returnsFalseon every call. This meansMockCoreSchema._built_memois never populated, and a newSchemaValidator/SchemaSerializer(heavy Rust objects) is allocated on every singleresponses.parse()call. These accumulate without bound.Root Cause
Pydantic's
MockCoreSchema._get_built()only caches its result (_built_memo) whenmodel_rebuildsucceeds (returnsTrue). ForParsedResponse[TextFormatT]whereTextFormatTis unresolved,model_rebuildalways returnsFalse, so the cache is never set and the rebuild runs on every access.Fix
Use the non-parameterized base classes instead:
This is functionally identical at runtime —
ParsedResponseand related classes guard their parameterised fields behindif TYPE_CHECKING:, so the type argument has no runtime effect:Impact
In a long-running server making repeated
responses.parse()calls, RSS grows linearly with request count and the process will eventually OOM.To Reproduce
.parse()API with a Pydantic modelOS
macOS