Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 7 additions & 11 deletions .devcontainer/java/devcontainer.json
Original file line number Diff line number Diff line change
@@ -1,24 +1,20 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/java
// README at: https://github.com/devcontainers/templates/tree/main/src/universal
{
"name": "Default Java",
// Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
"image": "mcr.microsoft.com/devcontainers/java:latest",
"features": {
"ghcr.io/devcontainers/features/azure-cli:1": {},
"ghcr.io/devcontainers/features/docker-in-docker:2": {},
"ghcr.io/azure/azure-dev/azd:0": {}
"ghcr.io/devcontainers/features/java:1": {
"version": "none",
"installMaven": "true"
},
"ghcr.io/devcontainers/features/azure-cli:1": {}
},
"customizations": {
"vscode": {
"extensions": [
"ms-azuretools.vscode-cosmosdb",
"buildwithlayer.mongodb-integration-expert-qS6DB",
"mongodb.mongodb-vscode",
"ms-azuretools.vscode-documentdb",
"redhat.java",
"vscjava.vscode-maven",
"vscjava.vscode-gradle"
"ms-azuretools.vscode-documentdb"
]
}
}
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -485,3 +485,8 @@ dist/
*.user
*.suo
*.sln.docstates

# Java
*.class
*.jar
target/
177 changes: 177 additions & 0 deletions ai/vector-search-java/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# DocumentDB Vector Samples (Java)

This project demonstrates vector search capabilities using Azure DocumentDB with Java. It includes implementations of three different vector index types: DiskANN, HNSW, and IVF.

## Overview

Vector search enables semantic similarity searching by converting text into high-dimensional vector representations (embeddings) and finding the most similar vectors in the database. This project shows how to:

- Generate embeddings using Azure OpenAI
- Store vectors in DocumentDB
- Create and use different types of vector indexes
- Perform similarity searches with various algorithms

## Prerequisites

Before running this project, you need:

### Azure Resources
1. **Azure subscription** with appropriate permissions
2. **[Azure Developer CLI (azd)](https://learn.microsoft.com/azure/developer/azure-developer-cli/)** installed

### Development Environment
- [Java 21 or higher](https://learn.microsoft.com/java/openjdk/download)
- [Maven 3.6 or higher](https://maven.apache.org/download.cgi)
- [Git](https://git-scm.com/downloads) (for cloning the repository)
- [Visual Studio Code](https://code.visualstudio.com/) (recommended) or another Java IDE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include the VS Code extension as well?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## Setup Instructions

### Clone and Setup Project

```bash
# Clone this repository
git clone https://github.com/Azure-Samples/documentdb-samples
```

### Deploy Azure Resources

This project uses Azure Developer CLI (azd) to deploy all required Azure resources from the existing infrastructure-as-code files.

#### Install Azure Developer CLI

If you haven't already, install the Azure Developer CLI:

**Windows:**
```powershell
winget install microsoft.azd
```

**macOS:**
```bash
brew tap azure/azd && brew install azd
```

**Linux:**
```bash
curl -fsSL https://aka.ms/install-azd.sh | bash
```

#### Deploy Resources

Navigate to the root of the repository and run:

```bash
# Login to Azure
azd auth login

# Provision Azure resources
azd up
```

During provisioning, you'll be prompted for:
- **Environment name**: A unique name for your deployment (e.g., "my-vector-search")
- **Azure subscription**: Select your Azure subscription
- **Location**: Choose from `eastus2` or `swedencentral` (required for OpenAI models)

The `azd up` command will:
- Create a resource group
- Deploy Azure OpenAI with text-embedding-3-small model
- Deploy Azure DocumentDB (MongoDB vCore) cluster
- Create a managed identity for secure access
- Configure all necessary permissions and networking
- Generate a `.env` file with all connection information at the repository root

### Compile the Project

```bash
# Move to Java vector search project
cd ai/vector-search-java

# Compile the project
mvn clean compile
```

### Load Environment Variables

After deployment completes, load the environment variables from the generated `.env` file. The `set -a` command ensures variables are exported to child processes (like the Maven JVM):

```bash
# From the ai/vector-search-java directory
set -a && source ../../.env && set +a
```

You can verify the environment variables are set:

```bash
echo $MONGO_CLUSTER_NAME
```

## Usage

The project includes several Java classes that demonstrate different aspects of vector search.

### Sign in to Azure for passwordless connection

```bash
az login
```

### DiskANN Vector Search

Run DiskANN (Disk-based Approximate Nearest Neighbor) search:

```bash
mvn exec:java -Dexec.mainClass="com.azure.documentdb.samples.DiskAnn"
```

DiskANN is optimized for:
- Large datasets that don't fit in memory
- Efficient disk-based storage
- Good balance of speed and accuracy

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also include what the expected Result should look like?

### HNSW Vector Search

Run HNSW (Hierarchical Navigable Small World) search:

```bash
mvn exec:java -Dexec.mainClass="com.azure.documentdb.samples.HNSW"
```

HNSW provides:
- Excellent search performance
- High recall rates
- Hierarchical graph structure
- Good for real-time applications

### IVF Vector Search

Run IVF (Inverted File) search:

```bash
mvn exec:java -Dexec.mainClass="com.azure.documentdb.samples.IVF"
```

IVF features:
- Clusters vectors by similarity
- Fast search through cluster centroids
- Configurable accuracy vs speed trade-offs
- Efficient for large vector datasets

## Further Resources

- [Azure Developer CLI Documentation](https://learn.microsoft.com/azure/developer/azure-developer-cli/)
- [Azure DocumentDB Documentation](https://learn.microsoft.com/azure/documentdb/)
- [Azure OpenAI Service Documentation](https://learn.microsoft.com/azure/ai-services/openai/)
- [Vector Search in DocumentDB](https://learn.microsoft.com/azure/documentdb/vector-search)
- [MongoDB Java Driver Documentation](https://mongodb.github.io/mongo-java-driver/)
- [Azure SDK for Java Documentation](https://learn.microsoft.com/java/api/overview/azure/)

## Support

If you encounter issues:
1. Verify Java 21+ is installed: `java -version`
2. Verify Maven is installed: `mvn -version`
3. Ensure Azure CLI is logged in: `az login`
4. Verify environment variables are exported: `echo $MONGO_CLUSTER_NAME`
5. Check Azure service status and quotas
43 changes: 43 additions & 0 deletions ai/vector-search-java/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>com.azure.documentdb.samples</groupId>
<artifactId>vector-search-quickstart</artifactId>
<version>1.0-SNAPSHOT</version>

<properties>
<maven.compiler.release>21</maven.compiler.release>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencies>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongodb-driver-sync</artifactId>
<version>5.6.2</version>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-identity</artifactId>
<version>1.18.1</version>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-openai</artifactId>
<version>1.0.0-beta.16</version>
</dependency>
<dependency>
<groupId>tools.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>3.0.3</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-nop</artifactId>
<version>2.0.17</version>
<scope>runtime</scope>
</dependency>
</dependencies>
</project>
Loading