End-to-end SQL support: schema discovery, SELECT * expansion, DataFrame bridge, helpers, guardrails#141
End-to-end SQL support: schema discovery, SELECT * expansion, DataFrame bridge, helpers, guardrails#141
Conversation
|
@copilot |
Updated the PR description to show the wildcard case — calling |
There was a problem hiding this comment.
Pull request overview
Adds public schema discovery methods under client.tables so SDK users can programmatically discover table columns (attributes) and relationship definitions, which is particularly helpful when authoring client.query.sql() queries.
Changes:
- Added
TableOperations.list_columns(),list_relationships(), andlist_table_relationships()public APIs that delegate to data-layer helpers. - Implemented the corresponding data-layer methods
_ODataClient._list_columns()and_RelationshipOperationsMixin._list_relationships()/_list_table_relationships(). - Added unit tests and README examples for the new discovery APIs.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/PowerPlatform/Dataverse/operations/tables.py |
Adds the three new public client.tables.* schema discovery methods and docstrings/examples. |
src/PowerPlatform/Dataverse/data/_odata.py |
Implements _list_columns() to fetch EntityDefinitions({MetadataId})/Attributes. |
src/PowerPlatform/Dataverse/data/_relationships.py |
Implements _list_relationships() and _list_table_relationships() (1:N + N:N) metadata discovery. |
tests/unit/test_tables_operations.py |
Verifies the new public methods delegate correctly to the underlying _odata methods. |
tests/unit/data/test_relationships.py |
Adds data-layer tests for _list_relationships() and _list_table_relationships(). |
tests/unit/data/test_list_columns.py |
New data-layer tests for _ODataClient._list_columns(). |
README.md |
Documents usage examples for the new schema discovery APIs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ith tests and README examples Co-authored-by: saurabhrb <32964911+saurabhrb@users.noreply.github.com>
…me bridge, guardrails, examples - Schema discovery APIs: list_columns, list_relationships, list_table_relationships - SELECT * auto-expansion via list_columns (server blocks SELECT *) - client.dataframe.sql() for SQL -> DataFrame bridge - SQL safety guardrails: block writes, auto-inject TOP 5000, warn on leading-wildcard LIKE and implicit cross joins - Comprehensive sql_examples.py (28 sections): JOINs, aggregates, GROUP BY, DISTINCT, OFFSET FETCH, polymorphic lookups, audit trail, SQL read -> DataFrame transform -> SDK write-back - Updated query.sql() docstring with accurate SQL capabilities - 732 tests (was 318), all passing
e9a25aa to
7f06533
Compare
…dd SELECT * + JOIN warning Live-tested against Aurora VM (aurorabapenv71aff.crm10.dynamics.com): - Server auto-caps at 5000 rows without TOP -> no client-side injection needed - Server blocks SELECT * on both single-table and JOIN queries -> SDK expansion confirmed needed - Added warning when SELECT * used with JOIN (expansion only includes first table columns) 732 tests passing.
New methods on client.query for SQL-first developers: - sql_columns(table) -> simplified column metadata list - sql_select(table) -> comma-separated column list for SELECT - sql_joins(table) -> all possible JOINs with ready-to-use clauses - sql_join(from, to) -> auto-generated JOIN clause between tables Key finding from live testing: SQL JOINs use the raw attribute name (e.g. parentcustomerid), NOT the _value suffix. The ReferencingAttribute from relationship metadata matches exactly. 13 new unit tests, 745 total passing.
New methods on client.query for OData users (parallel to SQL helpers): - odata_select(table) -> list[str] for records.get(select=) - odata_expands(table) -> all navigation properties with entity sets - odata_expand(from, to) -> PascalCase nav property name for expand= - odata_bind(from, to, id) -> @odata.bind dict for create/update payloads These eliminate the most error-prone parts of OData queries: - No more guessing PascalCase navigation property names for - No more manually constructing @odata.bind with entity set names - Column discovery matches records.get(select=) format directly 11 new unit tests, 756 total passing.
_list_table_relationships now fetches all 3 relationship types: - OneToManyRelationships (incoming: where table is referenced) - ManyToOneRelationships (outgoing: where table has a lookup) <-- NEW - ManyToManyRelationships This was critical for sql_joins() and odata_expands() which need outgoing lookups. Without ManyToOne, custom table lookups were invisible to the helpers. Also fixed: create_lookup_field requires lowercase logical names (not SchemaName) -- documented in copilot-instructions. Live-validated: 47/47 SQL tests passed on Aurora VM. Unit tests: 756 passed.
Columns with AttributeOf set are auto-generated display names (e.g. createdbyname, createdbyyominame) that cause errors in OData \ and are not real data columns. Now filtered out. Also: _list_table_relationships docstring updated for ManyToOne. 756 unit tests passing.
… depth - Moved write statement check (INSERT/UPDATE/DELETE) before _extract_logical_table() so users get the clear 'SQL endpoint is read-only' error instead of confusing 'no FROM clause' ValueError - Live-tested JOIN depth: 8-table JOINs confirmed working on Aurora VM - Live-tested nested polymorphic: customerid -> account -> owner -> businessunit chain works - Fixed validate_sql_full.py: lookup existence check (no more 95s waits), COUNT assertion relaxed for accumulated data, AttributeOf filter - 66/66 validation tests passed, 12/12 JOIN depth tests passed - 756 unit tests passing
1. Remove unused imports (patch, PropertyMock) from test_select_star_expansion 2. Fix JOIN depth '6+' -> 'no depth limit' in query.py docstring 3. Fix stale 'auto-inject TOP' comment in _odata.py 4. Fix Learn-incompatible docstring types in tables.py list_columns 5. Fix Learn-incompatible docstring types in tables.py list_relationships 6. Fix list_table_relationships docstring: add ManyToOne 7. Fix cross-join regex to catch unaliased FROM account, contact 8. Fix write regex to catch comment-prefixed writes (/* */ and --) 9. Narrow odata_expands exception handling (KeyError/AttributeError/ValueError) 10. Consistent JOIN depth claims across all docs 770 unit tests passing.
|
All 11 review comments addressed in commit a595a72:
770 unit tests passing. |
…taverse API)
Dataverse Web API supports EntityDefinitions({id})/ManyToOneRelationships
as a standard metadata endpoint. ManyToOne is the outgoing lookup
perspective (child -> parent), complementing OneToMany (parent -> child).
- README: 'one-to-many + many-to-many' -> 'one-to-many + many-to-one + many-to-many'
- _relationships.py docstring updated to match
- copilot-instructions.md: added SKILL.md reference to governance list
770 unit tests passing.
…ributeOf test Critical fixes: - tables.py: Learn-incompatible ':class:\list\ of :class:\str\' -> 'list[str]' - _odata.py, _relationships.py, _error_codes.py: add __all__ = [] per SKILL.md Important fixes: - _odata.py: _delete_multiple param 'logical_name' -> 'table_schema_name' - _odata.py: _lowercase_list docstring removed misleading \ mention - _odata.py: _query_sql guardrails comment clarified (defense-in-depth) New test: - test_excludes_attribute_of_columns: verifies computed display-name columns (with AttributeOf set) are filtered from sql_columns() 771 unit tests passing.
…ng approach CodeQL found exponential backtracking in _SQL_WRITE_RE which used nested quantifiers for SQL comment matching: (?:/\*.*?\*/\s*|...)* Fix: separate comment stripping (_SQL_COMMENT_RE) from write detection. 1. Strip SQL comments with a safe non-backtracking regex 2. Check for write keywords with the simple anchored regex The comment regex uses [^*]*\*+(?:[^/*][^*]*\*+)*/ which is the standard safe pattern for matching C-style block comments. 3 new tests for comment-prefixed write detection. 774 unit tests passing.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- sql_joins(): fix docstring example to not use aliases with join_clause (join_clause uses full table name; use sql_join() for aliased queries) - sql_examples: replace _value JOIN pattern with sql_join() helper - list_table_relationships: docstring now mentions ManyToOne - remove unused VALIDATION_SQL_CROSS_JOIN_BLOCKED import - PR description: cartesian updated from ValidationError to UserWarning
|
commenting for posterity and in support of the merger. |
Resolved conflicts in _odata.py (kept write-blocking + guardrails from HEAD, adopted _execute_raw/_build_sql pattern from main for proper URL encoding; merged all error code imports) and test_sql_parse.py (kept both JOIN extraction tests from HEAD and _build_sql URL encoding tests from main). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… .lower() from example Agent-Logs-Url: https://github.com/microsoft/PowerPlatform-DataverseClient-Python/sessions/508cd2e9-7649-42ec-b5c4-2768fea34725 Co-authored-by: saurabhrb <32964911+saurabhrb@users.noreply.github.com>
| print(f" {r.get('new_code', ''):<12s} Budget={r.get('new_budget')} Active={r.get('new_active')}") | ||
|
|
||
| # ============================================================== | ||
| # 5. SELECT * (auto-expanded by SDK) |
There was a problem hiding this comment.
It would be better not to auto-expand these because for wide entities with a lot of logical fields this results in extremely expensive queries. We deliberately reject SELECT * in the SQL API even though it would be trivial to implement because legacy APIs already know how to select all columns.
…sion to prevent expensive wildcard selects on wide entities
| # Block write statements FIRST (before table extraction, since | ||
| # UPDATE/INSERT/DELETE don't have FROM clauses). | ||
| # Strip SQL comments to catch e.g. /**/DELETE or --\\nDELETE. | ||
| sql_no_comments = self._SQL_COMMENT_RE.sub(" ", sql).strip() | ||
| if self._SQL_WRITE_RE.search(sql_no_comments): | ||
| raise ValidationError( | ||
| "SQL endpoint is read-only. Use client.records or " | ||
| "client.dataframe for write operations " | ||
| "(INSERT/UPDATE/DELETE are not supported).", | ||
| subcode=VALIDATION_SQL_WRITE_BLOCKED, | ||
| ) |
There was a problem hiding this comment.
This is also done in self._sql_guardrails(sql) below. If it is a duplicate check, it can be removed from here.
| try: | ||
| with self._client._scoped_odata() as od: | ||
| target_set = od._entity_set_from_schema_name(target) | ||
| except (KeyError, AttributeError, ValueError): |
There was a problem hiding this comment.
Should this also catch MetadataError which is raised by od._entity_set_from_schema_name(target) when entity set name resolution fails?
| VALIDATION_SQL_NOT_STRING = "validation_sql_not_string" | ||
| VALIDATION_SQL_EMPTY = "validation_sql_empty" | ||
| VALIDATION_SQL_WRITE_BLOCKED = "validation_sql_write_blocked" | ||
| VALIDATION_SQL_CROSS_JOIN_BLOCKED = "validation_sql_cross_join_blocked" |
There was a problem hiding this comment.
Nit: I didn't see VALIDATION_SQL_CROSS_JOIN_BLOCKED used in other places. I think it can be removed if not needed.
- Updated SQL query examples to use correct field names for ownerid and related entities. - Removed unsupported SQL validation error code. - Enhanced parameter handling for OneToMany and ManyToMany relationships. - Improved test coverage for metadata error handling in entity set resolution. - Fix integration tests
|
Should the PR title also be updated to remove |
| continue | ||
|
|
||
| # Generate a short alias for the target table | ||
| alias = target[0] if target else "j" |
There was a problem hiding this comment.
Could this cause an issue/bug if two tables have the same first letter in their names (e.g. account, annotation) or when multiple lookups point to the same table (e.g. ownerid, createdby both pointing to systemuser)? This may be worth testing to check the request doesn't fail in such cases.
SQL developers working with Dataverse shouldn't need to learn OData or the Web API. This PR closes that gap by making the Python SDK's SQL endpoint usable end-to-end -- from schema discovery, through querying (including SELECT *, JOINs, aggregates), to writing results back via DataFrames.
Live-validated against Aurora VM: 66/66 SQL tests + 15-table JOIN confirmed + all anti-patterns guarded.
What's new
Schema discovery APIs
list_columns(table),list_relationships(),list_table_relationships(table)(OneToMany + ManyToOne + ManyToMany)SELECT * auto-expansion
SDK transparently rewrites
SELECT *into explicit column names (server blocks * directly). Warns when used with JOINs.DataFrame SQL bridge
client.dataframe.sql()-- SQL to pandas DataFrame. Full bidirectional: SQL read -> pandas transform -> SDK write-back.SQL helpers (
client.query)sql_columns,sql_select,sql_joins,sql_join-- auto-discover columns and JOINs from metadata.OData helpers (
client.query)odata_select,odata_expands,odata_expand,odata_bind-- auto-discover nav properties, build @odata.bind.Auto-lowercase table names in
create_lookup_field_build_lookup_field_modelsnow automatically lowercasesreferencing_tableandreferenced_tableto valid Dataverse logical names. Callers no longer need manual.lower()calls.lookup_field_nameis preserved as-is (it is a SchemaName).SQL safety guardrails (3 layers of protection)
Layer 1 -- Server rejects (user cannot do these at all):
JOINwithoutON-- server errorINSERT/UPDATE/DELETE-- server errorSELECT *-- server error (SDK rewrites transparently)TOP > 5000-- server errorRIGHT/FULL OUTER/CROSS JOIN-- server errorLayer 2 -- SDK blocks or warns (before hitting the server):
INSERT/UPDATE/DELETE--ValidationErrorwith clear messageFROM a, b(cartesian) --UserWarning(server allows but produces N*M intermediate rows)LIKE '%value'--UserWarning(full table scan advisory)SELECT * with JOIN--UserWarning(partial expansion advisory)Layer 3 -- Documentation warns (sql_examples.py Section 32):
Anti-patterns and best practices (Section 32 of examples)
Dedicated section documenting 5 SQL anti-patterns with clear explanations of WHY they're dangerous (shared database impact) and HOW to fix them. The SDK's guardrails automatically catch patterns 1 and 2.
Key findings from live testing
parentcustomerid), not_valuesuffix\limit (SQL JOINs are flat, not nested)Test coverage
Comprehensive examples (
sql_examples.py-- 34 sections, 1194 lines)