-
Notifications
You must be signed in to change notification settings - Fork 4
Java vector search sample #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -485,3 +485,8 @@ dist/ | |
| *.user | ||
| *.suo | ||
| *.sln.docstates | ||
|
|
||
| # Java | ||
| *.class | ||
| *.jar | ||
| target/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,177 @@ | ||
| # DocumentDB Vector Samples (Java) | ||
|
|
||
| This project demonstrates vector search capabilities using Azure DocumentDB with Java. It includes implementations of three different vector index types: DiskANN, HNSW, and IVF. | ||
|
|
||
| ## Overview | ||
|
|
||
| Vector search enables semantic similarity searching by converting text into high-dimensional vector representations (embeddings) and finding the most similar vectors in the database. This project shows how to: | ||
|
|
||
| - Generate embeddings using Azure OpenAI | ||
| - Store vectors in DocumentDB | ||
| - Create and use different types of vector indexes | ||
| - Perform similarity searches with various algorithms | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| Before running this project, you need: | ||
|
|
||
| ### Azure Resources | ||
| 1. **Azure subscription** with appropriate permissions | ||
| 2. **[Azure Developer CLI (azd)](https://learn.microsoft.com/azure/developer/azure-developer-cli/)** installed | ||
|
|
||
| ### Development Environment | ||
| - [Java 21 or higher](https://learn.microsoft.com/java/openjdk/download) | ||
| - [Maven 3.6 or higher](https://maven.apache.org/download.cgi) | ||
| - [Git](https://git-scm.com/downloads) (for cloning the repository) | ||
| - [Visual Studio Code](https://code.visualstudio.com/) (recommended) or another Java IDE | ||
|
|
||
| ## Setup Instructions | ||
|
|
||
| ### Clone and Setup Project | ||
|
|
||
| ```bash | ||
| # Clone this repository | ||
| git clone https://github.com/Azure-Samples/documentdb-samples | ||
| ``` | ||
|
|
||
| ### Deploy Azure Resources | ||
|
|
||
| This project uses Azure Developer CLI (azd) to deploy all required Azure resources from the existing infrastructure-as-code files. | ||
|
|
||
| #### Install Azure Developer CLI | ||
|
|
||
| If you haven't already, install the Azure Developer CLI: | ||
|
|
||
| **Windows:** | ||
| ```powershell | ||
| winget install microsoft.azd | ||
| ``` | ||
|
|
||
| **macOS:** | ||
| ```bash | ||
| brew tap azure/azd && brew install azd | ||
| ``` | ||
|
|
||
| **Linux:** | ||
| ```bash | ||
| curl -fsSL https://aka.ms/install-azd.sh | bash | ||
| ``` | ||
|
|
||
| #### Deploy Resources | ||
|
|
||
| Navigate to the root of the repository and run: | ||
|
|
||
| ```bash | ||
| # Login to Azure | ||
| azd auth login | ||
|
|
||
| # Provision Azure resources | ||
| azd up | ||
| ``` | ||
|
|
||
| During provisioning, you'll be prompted for: | ||
| - **Environment name**: A unique name for your deployment (e.g., "my-vector-search") | ||
| - **Azure subscription**: Select your Azure subscription | ||
| - **Location**: Choose from `eastus2` or `swedencentral` (required for OpenAI models) | ||
|
|
||
| The `azd up` command will: | ||
| - Create a resource group | ||
| - Deploy Azure OpenAI with text-embedding-3-small model | ||
| - Deploy Azure DocumentDB (MongoDB vCore) cluster | ||
| - Create a managed identity for secure access | ||
| - Configure all necessary permissions and networking | ||
| - Generate a `.env` file with all connection information at the repository root | ||
|
|
||
| ### Compile the Project | ||
|
|
||
| ```bash | ||
| # Move to Java vector search project | ||
| cd ai/vector-search-java | ||
|
|
||
| # Compile the project | ||
| mvn clean compile | ||
| ``` | ||
|
|
||
| ### Load Environment Variables | ||
|
|
||
| After deployment completes, load the environment variables from the generated `.env` file. The `set -a` command ensures variables are exported to child processes (like the Maven JVM): | ||
|
|
||
| ```bash | ||
| # From the ai/vector-search-java directory | ||
| set -a && source ../../.env && set +a | ||
| ``` | ||
|
|
||
| You can verify the environment variables are set: | ||
|
|
||
| ```bash | ||
| echo $MONGO_CLUSTER_NAME | ||
| ``` | ||
|
|
||
| ## Usage | ||
|
|
||
| The project includes several Java classes that demonstrate different aspects of vector search. | ||
|
|
||
| ### Sign in to Azure for passwordless connection | ||
|
|
||
| ```bash | ||
| az login | ||
| ``` | ||
|
|
||
| ### DiskANN Vector Search | ||
|
|
||
| Run DiskANN (Disk-based Approximate Nearest Neighbor) search: | ||
|
|
||
| ```bash | ||
| mvn exec:java -Dexec.mainClass="com.azure.documentdb.samples.DiskAnn" | ||
| ``` | ||
|
|
||
| DiskANN is optimized for: | ||
| - Large datasets that don't fit in memory | ||
| - Efficient disk-based storage | ||
| - Good balance of speed and accuracy | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we also include what the expected Result should look like? |
||
| ### HNSW Vector Search | ||
|
|
||
| Run HNSW (Hierarchical Navigable Small World) search: | ||
|
|
||
| ```bash | ||
| mvn exec:java -Dexec.mainClass="com.azure.documentdb.samples.HNSW" | ||
| ``` | ||
|
|
||
| HNSW provides: | ||
| - Excellent search performance | ||
| - High recall rates | ||
| - Hierarchical graph structure | ||
| - Good for real-time applications | ||
|
|
||
| ### IVF Vector Search | ||
|
|
||
| Run IVF (Inverted File) search: | ||
|
|
||
| ```bash | ||
| mvn exec:java -Dexec.mainClass="com.azure.documentdb.samples.IVF" | ||
| ``` | ||
|
|
||
| IVF features: | ||
| - Clusters vectors by similarity | ||
| - Fast search through cluster centroids | ||
| - Configurable accuracy vs speed trade-offs | ||
| - Efficient for large vector datasets | ||
|
|
||
| ## Further Resources | ||
|
|
||
| - [Azure Developer CLI Documentation](https://learn.microsoft.com/azure/developer/azure-developer-cli/) | ||
| - [Azure DocumentDB Documentation](https://learn.microsoft.com/azure/documentdb/) | ||
| - [Azure OpenAI Service Documentation](https://learn.microsoft.com/azure/ai-services/openai/) | ||
| - [Vector Search in DocumentDB](https://learn.microsoft.com/azure/documentdb/vector-search) | ||
| - [MongoDB Java Driver Documentation](https://mongodb.github.io/mongo-java-driver/) | ||
| - [Azure SDK for Java Documentation](https://learn.microsoft.com/java/api/overview/azure/) | ||
|
|
||
| ## Support | ||
|
|
||
| If you encounter issues: | ||
| 1. Verify Java 21+ is installed: `java -version` | ||
| 2. Verify Maven is installed: `mvn -version` | ||
| 3. Ensure Azure CLI is logged in: `az login` | ||
| 4. Verify environment variables are exported: `echo $MONGO_CLUSTER_NAME` | ||
| 5. Check Azure service status and quotas | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| <project xmlns="http://maven.apache.org/POM/4.0.0" | ||
| xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
| xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | ||
| <modelVersion>4.0.0</modelVersion> | ||
|
|
||
| <groupId>com.azure.documentdb.samples</groupId> | ||
| <artifactId>vector-search-quickstart</artifactId> | ||
| <version>1.0-SNAPSHOT</version> | ||
|
|
||
| <properties> | ||
| <maven.compiler.release>21</maven.compiler.release> | ||
| <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> | ||
| </properties> | ||
|
|
||
| <dependencies> | ||
| <dependency> | ||
| <groupId>org.mongodb</groupId> | ||
| <artifactId>mongodb-driver-sync</artifactId> | ||
| <version>5.6.2</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>com.azure</groupId> | ||
| <artifactId>azure-identity</artifactId> | ||
| <version>1.18.1</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>com.azure</groupId> | ||
| <artifactId>azure-ai-openai</artifactId> | ||
| <version>1.0.0-beta.16</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>tools.jackson.core</groupId> | ||
| <artifactId>jackson-databind</artifactId> | ||
| <version>3.0.3</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>org.slf4j</groupId> | ||
| <artifactId>slf4j-nop</artifactId> | ||
| <version>2.0.17</version> | ||
| <scope>runtime</scope> | ||
| </dependency> | ||
| </dependencies> | ||
| </project> |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include the VS Code extension as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-documentdb