diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 00000000..4d074844 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,333 @@ +# Copilot Instructions for mssql-python + +## Repository Overview + +**mssql-python** is a Python driver for Microsoft SQL Server and Azure SQL databases that leverages Direct Database Connectivity (DDBC). It's built using **pybind11** and **CMake** to create native extensions, providing DB API 2.0 compliant database access with enhanced Pythonic features. + +- **Size**: Medium-scale project (~750KB total) +- **Languages**: Python (main), C++ (native bindings), CMake (build system) +- **Target Platforms**: Windows (x64, ARM64), macOS (Universal2), Linux (x86_64, ARM64) +- **Python Versions**: 3.10+ +- **Key Dependencies**: pybind11, azure-identity, Microsoft ODBC Driver 18 + +## Development Workflows + +This repository includes detailed prompt files for common tasks. Reference these with `#`: + +| Task | Prompt | When to Use | +|------|--------|-------------| +| First-time setup | `#setup-dev-env` | New machine, fresh clone | +| Build C++ extension | `#build-ddbc` | After modifying .cpp/.h files | +| Run tests | `#run-tests` | Validating changes | +| Create PR | `#create-pr` | Ready to submit changes | + +**Workflow order for new contributors:** +1. `#setup-dev-env` → Set up venv and dependencies +2. `#build-ddbc` → Build native extension +3. Make your changes +4. `#run-tests` → Validate +5. `#create-pr` → Submit + +## Usage Examples (For Suggesting to Users) + +> **Security Note**: Examples use `TrustServerCertificate=yes` for local development with self-signed certificates. For production, remove this option to ensure proper TLS certificate validation. + +### Basic Connection and Query +```python +import mssql_python + +conn = mssql_python.connect( + "SERVER=localhost;DATABASE=mydb;Trusted_Connection=yes;Encrypt=yes;TrustServerCertificate=yes" +) +cursor = conn.cursor() +cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,)) +rows = cursor.fetchall() +cursor.close() +conn.close() +``` + +### Context Manager (Recommended) +```python +with mssql_python.connect(connection_string) as conn: + with conn.cursor() as cursor: + cursor.execute("SELECT @@VERSION") + print(cursor.fetchone()[0]) +``` + +### Azure SQL with Entra ID +```python +conn = mssql_python.connect( + "SERVER=myserver.database.windows.net;DATABASE=mydb;" + "Authentication=ActiveDirectoryInteractive;Encrypt=yes" +) +``` + +### Parameterized Queries +```python +# Positional parameters +cursor.execute("SELECT * FROM users WHERE email = ?", (email,)) + +# Named parameters (pyformat) +cursor.execute("SELECT * FROM users WHERE email = %(email)s", {"email": email}) +``` + +### Insert with Commit +```python +cursor.execute("INSERT INTO users (name, email) VALUES (?, ?)", ("John", "john@example.com")) +conn.commit() +``` + +### Batch Insert +```python +cursor.executemany("INSERT INTO users (name, email) VALUES (?, ?)", [ + ("Alice", "alice@example.com"), + ("Bob", "bob@example.com"), +]) +conn.commit() +``` + +### Transaction Handling +```python +from mssql_python import DatabaseError + +try: + cursor.execute("UPDATE accounts SET balance = balance - 100 WHERE id = ?", (from_id,)) + cursor.execute("UPDATE accounts SET balance = balance + 100 WHERE id = ?", (to_id,)) + conn.commit() +except DatabaseError: + conn.rollback() + raise +``` + +## Build System and Validation + +### Prerequisites +**Always install these before building:** +```bash +# All platforms +pip install -r requirements.txt + +# Windows: Requires Visual Studio Build Tools with "Desktop development with C++" workload +# macOS: brew install cmake && brew install msodbcsql18 +# Linux: Install cmake, python3-dev, and ODBC driver per distribution +``` + +### Building the Project + +**CRITICAL**: The project requires building native extensions before testing. Extensions are platform-specific (`.pyd` on Windows, `.so` on macOS/Linux). + +#### Windows Build: +```bash +cd mssql_python/pybind +build.bat [x64|x86|arm64] # Defaults to x64 if not specified +``` + +#### macOS Build: +```bash +cd mssql_python/pybind +./build.sh # Creates universal2 binary (ARM64 + x86_64) +``` + +#### Linux Build: +```bash +cd mssql_python/pybind +./build.sh # Detects architecture automatically +``` + +**Build Output**: Creates `ddbc_bindings.cp{python_version}-{architecture}.{so|pyd}` in the `mssql_python/` directory. + +### Testing + +**IMPORTANT**: Tests require a SQL Server connection via `DB_CONNECTION_STRING` environment variable. + +```bash +# Run all tests with coverage +python -m pytest -v --cov=. --cov-report=xml --capture=tee-sys --cache-clear + +# Run specific test files +python -m pytest tests/test_000_dependencies.py -v # Dependency checks +python -m pytest tests/test_001_globals.py -v # Basic functionality +``` + +**Test Dependencies**: Tests require building the native extension first. The dependency test (`test_000_dependencies.py`) validates that all platform-specific libraries exist. + +### Linting and Code Quality + +```bash +# Python formatting +pip install black +black --check --line-length=100 mssql_python/ tests/ + +# C++ formatting +clang-format -style=file -i mssql_python/pybind/*.cpp mssql_python/pybind/*.h + +# Coverage reporting (configured in .coveragerc) +python -m pytest --cov=. --cov-report=html +``` + +## Project Architecture + +### Core Components + +``` +mssql_python/ +├── __init__.py # Package initialization, connection registry, cleanup +├── connection.py # DB API 2.0 connection object +├── cursor.py # DB API 2.0 cursor object +├── db_connection.py # connect() function implementation +├── auth.py # Microsoft Entra ID authentication +├── pooling.py # Connection pooling implementation +├── logging.py # Logging configuration +├── exceptions.py # Exception hierarchy +├── connection_string_builder.py # Connection string construction +├── connection_string_parser.py # Connection string parsing +├── parameter_helper.py # Query parameter handling +├── row.py # Row object implementation +├── type.py # DB API 2.0 type objects +├── constants.py # ODBC constants +├── helpers.py # Utility functions and settings +├── ddbc_bindings.py # Platform-specific extension loader with architecture detection +├── mssql_python.pyi # Type stubs for IDE support +└── pybind/ # Native extension source + ├── ddbc_bindings.cpp # Main C++ binding code + ├── CMakeLists.txt # Cross-platform build configuration + ├── build.sh/.bat # Platform-specific build scripts + └── configure_dylibs.sh # macOS dylib configuration +``` + +### Platform-Specific Libraries + +``` +mssql_python/libs/ +├── windows/{x64,x86,arm64}/ # Windows ODBC drivers and dependencies +├── macos/{arm64,x86_64}/lib/ # macOS dylibs +└── linux/{debian_ubuntu,rhel,suse,alpine}/{x86_64,arm64}/lib/ # Linux distributions +``` + +### Configuration Files + +- **`.clang-format`**: C++ formatting (Google style, 100 column limit) +- **`.coveragerc`**: Coverage exclusions (main.py, setup.py, tests/) +- **`requirements.txt`**: Development dependencies (pytest, pybind11, coverage) +- **`setup.py`**: Package configuration with platform detection +- **`pyproject.toml`**: Modern Python packaging configuration +- **`.gitignore`**: Excludes build artifacts (*.so, *.pyd, build/, __pycache__) + +## CI/CD Pipeline Details + +### GitHub Workflows +- **`devskim.yml`**: Security scanning (runs on PRs and main) +- **`pr-format-check.yml`**: PR validation (title format, GitHub issue/ADO work item links) + +### Azure DevOps Pipelines (`eng/pipelines/`) +- **`pr-validation-pipeline.yml`**: Comprehensive testing across all platforms +- **`build-whl-pipeline.yml`**: Wheel building for distribution +- **Platform Coverage**: Windows (LocalDB), macOS (Docker SQL Server), Linux (Ubuntu, Debian, RHEL, Alpine) with both x86_64 and ARM64 + +### Build Matrix +The CI system tests: +- **Python versions**: 3.10, 3.11, 3.12, 3.13 +- **Windows**: x64, ARM64 architectures +- **macOS**: Universal2 (ARM64 + x86_64) +- **Linux**: Multiple distributions (Debian, Ubuntu, RHEL, Alpine) on x86_64 and ARM64 + +## Common Build Issues and Workarounds + +### macOS-Specific Issues +- **dylib path configuration**: Run `configure_dylibs.sh` after building to fix library paths +- **codesigning**: Script automatically codesigns libraries for compatibility + +### Linux Distribution Differences +- **Debian/Ubuntu**: Use `apt-get install python3-dev cmake pybind11-dev` +- **RHEL**: Requires enabling CodeReady Builder repository for development tools +- **Alpine**: Uses musl libc, requires special handling in build scripts + +### Windows Build Dependencies +- **Visual Studio Build Tools**: Must include "Desktop development with C++" workload +- **Architecture Detection**: Build scripts auto-detect target architecture from environment + +### Known Limitations (from TODOs) +- Linux RPATH configuration pending for driver .so files +- Some Unicode support gaps in executemany operations +- Platform-specific test dependencies in exception handling + +## Architecture Detection and Loading + +The `ddbc_bindings.py` module implements sophisticated architecture detection: +- **Windows**: Normalizes `win64/amd64/x64` → `x64`, `win32/x86` → `x86`, `arm64` → `arm64` +- **macOS**: Runtime architecture detection, always loads from universal2 binary +- **Linux**: Maps `x64/amd64` → `x86_64`, `arm64/aarch64` → `arm64` + +## Exception Hierarchy + +Critical for error handling guidance: + +``` +Exception (base) +├── Warning +└── Error + ├── InterfaceError # Driver/interface issues + └── DatabaseError + ├── DataError # Invalid data processing + ├── OperationalError # Connection/timeout issues + ├── IntegrityError # Constraint violations + ├── InternalError # Internal driver/database errors + ├── ProgrammingError # SQL syntax errors + └── NotSupportedError # Unsupported features/operations +``` + +## Critical Anti-Patterns (DO NOT) + +- **NEVER** hardcode connection strings - always use `DB_CONNECTION_STRING` env var for tests +- **NEVER** use `pyodbc` imports - this driver doesn't require external ODBC +- **NEVER** modify files in `mssql_python/libs/` - these are pre-built binaries +- **NEVER** skip `conn.commit()` after INSERT/UPDATE/DELETE operations +- **NEVER** use bare `except:` blocks - always catch specific exceptions +- **NEVER** leave connections open - use context managers or explicit `close()` + +## When Modifying Code + +### Python Changes +- Preserve existing error handling patterns from `exceptions.py` +- Use context managers (`with`) for all connection/cursor operations +- Update `__all__` exports if adding public API +- Add corresponding test in `tests/test_*.py` +- Follow Black formatting (line length 100) + +### C++ Changes +- Follow RAII patterns for resource management +- Use `py::gil_scoped_release` for blocking ODBC operations +- Update `mssql_python.pyi` type stubs if changing Python API +- Follow `.clang-format` style (Google style, 100 column limit) + +## Debugging Quick Reference + +| Error | Cause | Solution | +|-------|-------|----------| +| `ImportError: ddbc_bindings` | Extension not built | Run `#build-ddbc` | +| Connection timeout | Missing env var | Set `DB_CONNECTION_STRING` | +| `dylib not found` (macOS) | Library paths | Run `configure_dylibs.sh` | +| `ODBC Driver not found` | Missing driver | Install Microsoft ODBC Driver 18 | +| `ModuleNotFoundError` | Not in venv | Run `#setup-dev-env` | + +## Contributing Guidelines + +### PR Requirements +- **Title Format**: Must start with `FEAT:`, `CHORE:`, `FIX:`, `DOC:`, `STYLE:`, `REFACTOR:`, or `RELEASE:` +- **Issue Linking**: Must link to either GitHub issue or ADO work item +- **Summary**: Minimum 10 characters of meaningful content under "### Summary" + +### Development Workflow +1. **Always build native extensions first** before running tests +2. **Use virtual environments** for dependency isolation +3. **Test on target platform** before submitting PRs +4. **Check CI pipeline results** for cross-platform compatibility + +## Trust These Instructions + +These instructions are comprehensive and tested. Only search for additional information if: +- Build commands fail with unexpected errors +- New platform support is being added +- Dependencies or requirements have changed + +For any ambiguity, refer to the platform-specific README in `mssql_python/pybind/README.md` or the comprehensive CI pipeline configurations in `eng/pipelines/`. diff --git a/llms.txt b/llms.txt new file mode 100644 index 00000000..f97632af --- /dev/null +++ b/llms.txt @@ -0,0 +1,311 @@ +# mssql-python + +> Microsoft Python Driver for SQL Server and Azure SQL + +mssql-python is the official Microsoft Python driver for SQL Server and Azure SQL databases. It provides DB API 2.0 compliant database access with Direct Database Connectivity (DDBC) - no external ODBC driver manager required. + +## Installation + +```bash +pip install mssql-python +``` + +Or using uv (recommended for faster installs): +```bash +uv pip install mssql-python +``` + +### Platform-Specific Dependencies + +**macOS** - Install the ODBC driver: +```bash +brew install msodbcsql18 +``` + +**Debian/Ubuntu**: +```bash +curl -fsSL https://packages.microsoft.com/keys/microsoft.asc | sudo gpg --dearmor -o /usr/share/keyrings/microsoft-prod.gpg +curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list +sudo apt-get update +sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18 +``` + +**RHEL/CentOS/Fedora**: +```bash +curl -fsSL https://packages.microsoft.com/keys/microsoft.asc | sudo gpg --dearmor -o /usr/share/keyrings/microsoft-prod.gpg +curl https://packages.microsoft.com/config/rhel/$(rpm -E %rhel)/prod.repo | sudo tee /etc/yum.repos.d/mssql-release.repo +sudo ACCEPT_EULA=Y yum install -y msodbcsql18 +``` + +**Alpine Linux**: +```bash +apk add --no-cache curl gnupg +curl -O https://download.microsoft.com/download/1/f/f/1fffeb70-cd3d-42d5-a7ee-b75c4ff5fb9e/msodbcsql18_18.4.1.1-1_amd64.apk +apk add --allow-untrusted msodbcsql18_18.4.1.1-1_amd64.apk +``` + +**SUSE Linux**: +```bash +sudo zypper addrepo https://packages.microsoft.com/config/sles/15/prod.repo +sudo ACCEPT_EULA=Y zypper install -y msodbcsql18 +``` + +## Quick Start + +> **Security Note**: The examples below use `TrustServerCertificate=yes` for local development with self-signed certificates. For production or remote connections, remove this option to ensure proper TLS certificate validation. + +```python +import mssql_python + +# Connect to SQL Server +conn = mssql_python.connect( + "SERVER=localhost;DATABASE=mydb;Trusted_Connection=yes;Encrypt=yes;TrustServerCertificate=yes" +) +cursor = conn.cursor() + +# Execute a query +cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,)) +rows = cursor.fetchall() + +# Always close when done +cursor.close() +conn.close() +``` + +## Connection String Formats + +### Windows Authentication (Trusted Connection) +```python +conn = mssql_python.connect( + "SERVER=localhost;DATABASE=mydb;Trusted_Connection=yes;Encrypt=yes;TrustServerCertificate=yes" +) +``` + +### SQL Server Authentication +```python +conn = mssql_python.connect( + "SERVER=localhost;DATABASE=mydb;UID=myuser;PWD=mypassword;Encrypt=yes;TrustServerCertificate=yes" +) +``` + +### Azure SQL with Entra ID (Interactive) +```python +conn = mssql_python.connect( + "SERVER=myserver.database.windows.net;DATABASE=mydb;Authentication=ActiveDirectoryInteractive;Encrypt=yes" +) +``` + +### Azure SQL with Managed Identity +```python +conn = mssql_python.connect( + "SERVER=myserver.database.windows.net;DATABASE=mydb;Authentication=ActiveDirectoryMSI;Encrypt=yes" +) +``` + +### Azure SQL with Service Principal +```python +conn = mssql_python.connect( + "SERVER=myserver.database.windows.net;DATABASE=mydb;" + "Authentication=ActiveDirectoryServicePrincipal;" + "UID=;PWD=;Encrypt=yes" +) +``` + +## Context Manager Usage (Recommended) + +```python +import mssql_python + +# Connection and cursor automatically close on exit +with mssql_python.connect(connection_string) as conn: + with conn.cursor() as cursor: + cursor.execute("SELECT @@VERSION") + version = cursor.fetchone() + print(version[0]) +``` + +## Query Patterns + +### Parameterized Queries (Prevent SQL Injection) +```python +# Use ? placeholders for parameters +cursor.execute("SELECT * FROM users WHERE email = ? AND active = ?", (email, True)) + +# Or use named parameters with pyformat style +cursor.execute("SELECT * FROM users WHERE email = %(email)s", {"email": email}) +``` + +### Fetch Results +```python +# Fetch one row +row = cursor.fetchone() + +# Fetch multiple rows +rows = cursor.fetchmany(size=100) + +# Fetch all remaining rows +all_rows = cursor.fetchall() +``` + +### Insert Data +```python +cursor.execute( + "INSERT INTO users (name, email) VALUES (?, ?)", + ("John Doe", "john@example.com") +) +conn.commit() # Don't forget to commit! +``` + +### Insert Multiple Rows (executemany) +```python +users = [ + ("Alice", "alice@example.com"), + ("Bob", "bob@example.com"), + ("Charlie", "charlie@example.com"), +] +cursor.executemany("INSERT INTO users (name, email) VALUES (?, ?)", users) +conn.commit() +``` + +### Update Data +```python +cursor.execute("UPDATE users SET active = ? WHERE id = ?", (False, user_id)) +conn.commit() +print(f"Rows affected: {cursor.rowcount}") +``` + +### Delete Data +```python +cursor.execute("DELETE FROM users WHERE id = ?", (user_id,)) +conn.commit() +``` + +## Transaction Management + +```python +try: + cursor.execute("INSERT INTO orders (customer_id, total) VALUES (?, ?)", (1, 99.99)) + cursor.execute("UPDATE inventory SET quantity = quantity - 1 WHERE product_id = ?", (42,)) + conn.commit() # Commit both changes +except Exception as e: + conn.rollback() # Rollback on error + raise +``` + +### Autocommit Mode +```python +conn = mssql_python.connect(connection_string) +conn.autocommit = True # Each statement commits immediately +``` + +## Working with Results + +### Access Columns by Index +```python +cursor.execute("SELECT id, name, email FROM users") +for row in cursor.fetchall(): + user_id = row[0] + name = row[1] + email = row[2] +``` + +### Get Column Metadata +```python +cursor.execute("SELECT id, name, email FROM users") +for column in cursor.description: + print(f"Column: {column[0]}, Type: {column[1]}") +``` + +## Stored Procedures + +```python +# Call a stored procedure +cursor.execute("EXEC GetUserById ?", (user_id,)) +result = cursor.fetchone() + +# Stored procedure with output +cursor.execute(""" + DECLARE @count INT; + EXEC CountActiveUsers @count OUTPUT; + SELECT @count; +""") +count = cursor.fetchone()[0] +``` + +## Error Handling + +```python +from mssql_python import ( + Error, + DatabaseError, + IntegrityError, + ProgrammingError, + OperationalError, +) + +try: + cursor.execute("INSERT INTO users (id, name) VALUES (?, ?)", (1, "John")) + conn.commit() +except IntegrityError as e: + # Constraint violation (duplicate key, foreign key, etc.) + print(f"Constraint error: {e}") +except ProgrammingError as e: + # SQL syntax error or invalid query + print(f"SQL error: {e}") +except OperationalError as e: + # Connection issues, timeouts, etc. + print(f"Operational error: {e}") +except DatabaseError as e: + # General database error + print(f"Database error: {e}") +``` + +## Connection Pooling + +Connection pooling is enabled by default for improved performance: + +```python +# Pooling is automatic - connections are reused +conn1 = mssql_python.connect(connection_string) +conn1.close() # Returns to pool + +conn2 = mssql_python.connect(connection_string) # Reuses pooled connection +``` + +## Comparison to pyodbc + +| Feature | mssql-python | pyodbc | +|---------|-------------|--------| +| Driver Manager | Built-in (DDBC) | Requires ODBC | +| Installation | `pip install` only | Requires ODBC setup | +| Azure Entra ID | Native support | Requires extra config | +| Connection Pooling | Built-in | Manual | +| DB API 2.0 | ✅ Compliant | ✅ Compliant | + +### Migration from pyodbc +```python +# pyodbc +import pyodbc +conn = pyodbc.connect("DRIVER={ODBC Driver 18 for SQL Server};SERVER=...;DATABASE=...") + +# mssql-python (no DRIVER needed) +import mssql_python +conn = mssql_python.connect("SERVER=...;DATABASE=...") +``` + +## Supported Platforms + +- Windows (x64, ARM64) +- macOS (ARM64, x86_64) +- Linux (x86_64, ARM64): Debian, Ubuntu, RHEL, SUSE, Alpine (musl) + +## Supported Python Versions + +- Python 3.10+ + +## Links + +- Documentation: https://github.com/microsoft/mssql-python/wiki +- PyPI: https://pypi.org/project/mssql-python/ +- GitHub: https://github.com/microsoft/mssql-python +- Issues: https://github.com/microsoft/mssql-python/issues