Arrow IPC binary fetch path for DataFrame execution by Martozar · Pull Request #1489 · gooddata/gooddata-python-sdk

Martozar · 2026-03-30T08:02:12Z

Summary

Adds a native Arrow IPC binary fetch path to gooddata-pandas, providing a faster alternative to the existing JSON-paged AFM path for large result sets.

What changed

gooddata-sdk — binary fetch

BareExecutionResponse.read_result_arrow() fetches execution results from the server's binary IPC endpoint and returns a pyarrow.Table.

gooddata-pandas — Arrow→DataFrame conversion

DataFrameFactory.for_exec_def_arrow() — new public method that mirrors for_exec_def() but uses the binary path.
for_arrow_table() — pure conversion from pa.Table to (pd.DataFrame, DataFrameMetadata), enabling callers to bring their own Arrow data.
convert_arrow_table_to_dataframe() — low-level converter that reconstructs row/column MultiIndex, subtotals, primary labels, and types from Arrow field metadata.

Why

The JSON paging path serialises every result to JSON and pages it in chunks — it is CPU-heavy and slow for wide or deep result sets. Arrow IPC transfers binary columnar
data in a single round-trip. End-to-end benchmarks against the GoodData demo workspace show 1.3×–33× speedup depending on table shape, with larger tables benefiting most .

Test coverage

140 unit tests covering: missing metadata keys (all three required keys), self_destruct mode, _build_field_index edge cases (subtotal padding, asymmetric depth), compute_row_totals_indexes with empty dimensions, for_arrow_table correctness across flat/transposed/subtotals/both-dim-totals cases.
47 ground-truth fixture cases generated against the live API and committed to tests/dataframe/fixtures/arrow/, including 3-metric tables, 3-level nested subtotals, multi-aggregation multi-metric tables, and asymmetric totals (different levels/aggregations per metric).
IPC test fixture updated to use ipc.new_file to match the server format.

codecov · 2026-03-30T09:31:36Z

Codecov Report

❌ Patch coverage is 91.99134% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.66%. Comparing base (49ea0d5) to head (6828bad).
⚠️ Report is 27 commits behind head on master.

Files with missing lines	Patch %	Lines
...data-pandas/src/gooddata_pandas/arrow_convertor.py	96.85%	10 Missing ⚠️
...s/gooddata-pandas/src/gooddata_pandas/dataframe.py	83.33%	10 Missing ⚠️
...ages/gooddata-pandas/src/gooddata_pandas/series.py	57.14%	6 Missing ⚠️
...ta-sdk/src/gooddata_sdk/compute/model/execution.py	76.47%	4 Missing ⚠️
...es/gooddata-pandas/src/gooddata_pandas/__init__.py	50.00%	3 Missing ⚠️
...gooddata-pandas/src/gooddata_pandas/data_access.py	88.00%	3 Missing ⚠️
...ddata-sdk/src/gooddata_sdk/compute/model/filter.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1489      +/-   ##
==========================================
+ Coverage   78.13%   78.66%   +0.53%     
==========================================
  Files         228      230       +2     
  Lines       14926    15400     +474     
==========================================
+ Hits        11662    12114     +452     
- Misses       3264     3286      +22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…arrow

…w result reading

Switch read_result_arrow to explicitly request application/vnd.apache.arrow.stream via Accept header and pipe the HTTP response directly into ipc.open_stream(), eliminating the intermediate BytesIO buffer. Update tests accordingly.

Add a parallel Arrow IPC execution path to DataFrameFactory and SeriesFactory that fetches results via the binary endpoint instead of JSON pagination: - arrow_convertor: pa.Table -> DataFrame conversion with label_overrides, grand_totals reordering, column_totals_indexes, primary_labels resolution, and metric field index helper - dataframe: for_exec_def_arrow(), for_arrow_table(), for_exec_result_id Arrow branch; Arrow path wired through for_visualization(), for_created_visualization() - series: use_arrow=True on indexed() / not_indexed() - ArrowConfig holds conversion params (self_destruct, types_mapper, custom_mapping); use_arrow is a dedicated DataFrameFactory.__init__ parameter risk: nonprod

Backfill column_totals_indexes into all 36 fixture meta.json files; extend parity tests to cover all four DataFrameMetadata fields (row_totals_indexes, column_totals_indexes, primary_labels_from_index, primary_labels_from_columns) and expand for_arrow_table tests from 4 hand-picked cases to the full fixture set. risk: nonprod

hkad98 · 2026-04-13T08:54:24Z

+# (C) 2026 GoodData Corporation
+from __future__ import annotations
+
+import json


Consider using orjson – can be done as a follow-up.

hkad98 · 2026-04-13T08:56:49Z

    "python-dotenv~=1.0.0",
    "deepdiff~=8.5.0",
    "tests_support",
+    "pyarrow>=16.1.0",


There is pyarrow>=23.0.1 in project.optional-dependencies. Consider unifying it.

Replace stdlib json with orjson in arrow_convertor.py for faster metadata parsing. Add orjson>=3.11.0 to the arrow optional dependency group and align the test group's pyarrow floor to match the arrow extra (>=23.0.1). risk: nonprod

Martozar requested review from hkad98, jaceksan, lupko and pcerny as code owners March 30, 2026 08:02

Martozar force-pushed the c.mze-cq-105 branch 3 times, most recently from 7453528 to 0380d40 Compare March 30, 2026 09:22

Martozar marked this pull request as draft March 30, 2026 10:47

hkad98 reviewed Mar 30, 2026

View reviewed changes

Comment thread packages/gooddata-sdk/pyproject.toml Outdated

no23reason reviewed Mar 31, 2026

View reviewed changes

Comment thread packages/gooddata-pandas/src/gooddata_pandas/dataframe.py

Martozar changed the title ~~C.mze cq 105~~ Arrow IPC binary fetch path for DataFrame execution Apr 1, 2026

Martozar marked this pull request as ready for review April 1, 2026 11:01

no23reason reviewed Apr 1, 2026

View reviewed changes

Comment thread packages/gooddata-sdk/src/gooddata_sdk/catalog/export/service.py

no23reason reviewed Apr 1, 2026

View reviewed changes

Comment thread packages/gooddata-sdk/src/gooddata_sdk/compute/model/execution.py Outdated

Martozar force-pushed the c.mze-cq-105 branch 2 times, most recently from d7fbc76 to 4e99271 Compare April 1, 2026 13:54

Martozar added 4 commits April 1, 2026 15:57

feat(gooddata-pandas): add Arrow IPC execution path via for_exec_def_…

b7d2744

…arrow

fix(gooddata-sdk): update type annotations for ty 0.0.27 and fix Arro…

6a20f1c

…w result reading

docs(export): fix get_raw_export_bytes docstring to be format-agnostic

7dddd3a

Martozar force-pushed the c.mze-cq-105 branch from 4e99271 to 7dddd3a Compare April 1, 2026 14:00

no23reason previously approved these changes Apr 1, 2026

View reviewed changes

Martozar dismissed no23reason’s stale review via 79e7715 April 13, 2026 07:53

Martozar force-pushed the c.mze-cq-105 branch 3 times, most recently from 2468c32 to 8dc2511 Compare April 13, 2026 08:34

Martozar added 2 commits April 13, 2026 10:41

Martozar force-pushed the c.mze-cq-105 branch from 8dc2511 to 199fd91 Compare April 13, 2026 08:42

Martozar force-pushed the c.mze-cq-105 branch from 199fd91 to 1810ecc Compare April 13, 2026 08:47

hkad98 reviewed Apr 13, 2026

View reviewed changes

Martozar force-pushed the c.mze-cq-105 branch 2 times, most recently from 40b16be to 3dc178e Compare April 13, 2026 09:11

Martozar force-pushed the c.mze-cq-105 branch from 3dc178e to 6828bad Compare April 13, 2026 09:36

hkad98 approved these changes Apr 13, 2026

View reviewed changes

Martozar merged commit d7f50b7 into gooddata:master Apr 13, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arrow IPC binary fetch path for DataFrame execution#1489

Arrow IPC binary fetch path for DataFrame execution#1489
Martozar merged 7 commits intogooddata:masterfrom
Martozar:c.mze-cq-105

Martozar commented Mar 30, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hkad98 Apr 13, 2026

Uh oh!

hkad98 Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Martozar commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hkad98 Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

hkad98 Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Martozar commented Mar 30, 2026 •

edited

Loading

codecov bot commented Mar 30, 2026 •

edited

Loading