Skip to content

JAMES-4182 Proposal for changes in BlobStoreDAO interface#2960

Open
chibenwa wants to merge 17 commits intoapache:masterfrom
chibenwa:JAMES-4182-api
Open

JAMES-4182 Proposal for changes in BlobStoreDAO interface#2960
chibenwa wants to merge 17 commits intoapache:masterfrom
chibenwa:JAMES-4182-api

Conversation

@chibenwa
Copy link
Copy Markdown
Contributor

@chibenwa chibenwa commented Mar 3, 2026

Opened to collect initial feedback

See https://issues.apache.org/jira/browse/JAMES-4182 for context

Remaining points:

  • better validation for metadata
  • Store metadata in blob stores
    • xattr for files
    • extra collumn for PG, cassandra
    • Object metadata for S3
    • Direct storage for memory
    • Passing those around for other blobStore DAO
    • AES blob store should support metadata

@chibenwa chibenwa changed the title James 4182 api JAMES-4182 Proposal for changes in BlobStoreDAO interface Mar 3, 2026
@chibenwa chibenwa requested a review from jeantil March 3, 2026 16:25
@jeantil
Copy link
Copy Markdown
Contributor

jeantil commented Mar 10, 2026

* [ ]  better validation for metadata 
* [ ]  Store metadata in blob stores       
  * [ ]  xattr for files
  * [ ]  extra collumn for PG, cassandra
  * [ ]  Object metadata for S3
  * [ ]  Direct storage for memory
  * [ ]  Passing those around for other blobStore DAO

I think we should only use dedicated properties to store acitvely managed metadata, the remaining metadata as a Map<String,String> can be serializaed with the payload. this avoids possible capability conflicts between blog stores.

Copy link
Copy Markdown
Contributor

@jeantil jeantil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good direction to me, I left some comments, I'm sure there are which you are aware of and will addres, feel free to close on sight

}

sealed interface Blob {
Map<BlobMetadataName, BlobMetadataValue> metadata();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentionned above it would be better for keys to be known,
alternatively this could be an object with kmown fields which james cares about and a map for extra fields which are not interpreted by james

@chibenwa
Copy link
Copy Markdown
Contributor Author

@jeantil do we have an agreement on continuing this work with an open metadata model?

@jeantil
Copy link
Copy Markdown
Contributor

jeantil commented Mar 12, 2026

@jeantil do we have an agreement on continuing this work with an open metadata model?

I'm not comfortable working with only the open metadata model. I understand the extension point argument but
I would much prefer an hybrid approach where metadata which is actively used by james is materialized as such in the metadata object and accessed through structured accessors instead of iterating through a map comparing strings every time we need the information
The rest of available metadata can be left as an unstructured map as some kind of extension point

@chibenwa
Copy link
Copy Markdown
Contributor Author

chibenwa commented Mar 12, 2026

I would much prefer an hybrid approach where metadata which is actively used by james is materialized as such in the metadata object and accessed through structured accessors instead of iterating through a map comparing strings every time we need the information

I propose:

  • Keeping the undlying data as (typed) Map<String, String>
  • Adding convenience factory methods for manipulating known value

Essentially:

record BlobMetadata(Map<BlobMetadataName, BlobMetadataValue> metadata) {
   public static BlobMetadata empty() {...}
   
   public ContentTransferEncoding contentTransferEncoding() {
       return ContentTransferEncoding.fromBlobMetadataValue(metatdata.get(BlobMetadataName.CONTENT_TRANSFER_ENCODING));
   }
   
   public BlobMetadata withMetadata(BlobMetadataName name, BlobMetadataValue value) {
       return new BlobMetadata(ImmutableList.builder()
           .putAll(metadata)
           .put(name, value)
           .build());
   }
   
   public BlobMetadata withContentTransferEncoding(ContentTransferEncoding encoding) {
       return withMetadata(BlobMetadataName.CONTENT_TRANSFER_ENCODING, encoding.asBlobMetadataValue());
   }
}

I think it keeps the underlying data model simple, extensible, easy to work with in DAOs, and it address your concerns and makes James-used metadata a prime citizen.

@jeantil
Copy link
Copy Markdown
Contributor

jeantil commented Mar 12, 2026

this looks ok for me, ideally we can enforce usage of an immutable map especially if it remains accessible from outside

@chibenwa
Copy link
Copy Markdown
Contributor Author

ideally we can enforce usage of an immutable map especially if it remains accessible from outside

Of course ;-)

@chibenwa
Copy link
Copy Markdown
Contributor Author

I'd like to thank you for the review effort @jeantil

I will put together the effort to align the code with the comments, and continue the implementation likely next week.

@chibenwa
Copy link
Copy Markdown
Contributor Author

Hello @jeantil

I did rework the metadata model and wrote memory implementation.

I'll carry other with other impelementation when I have time, but wanted to push now to enable early review, in case people have time.


Mono.from(testee.save(TEST_BUCKET_NAME, TEST_BLOB_ID, bytesBlob)).block();

assertThat(Mono.from(testee.readBytes(TEST_BUCKET_NAME, TEST_BLOB_ID)).block().metadata().metadata())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll use this one as an example to let you pick the variant you prefer, I'm fine with both :D

Suggested change
assertThat(Mono.from(testee.readBytes(TEST_BUCKET_NAME, TEST_BLOB_ID)).block().metadata().metadata())
assertThat(Mono.from(testee.readBytes(TEST_BUCKET_NAME, TEST_BLOB_ID)).block().metadata().values())
assertThat(Mono.from(testee.readBytes(TEST_BUCKET_NAME, TEST_BLOB_ID)).block().metadata().underlyingMap())

chibenwa and others added 12 commits April 7, 2026 11:55
S3MinioTest::saveShouldWorkWhenValidBlobId failed
  org.opentest4j.AssertionFailedError:
  expected: BytesBlob[payload=[B@6bca6c4c, metadata=BlobMetadata[metadata={}]]
  but was: BytesBlob[payload=[B@507e3e77, metadata=BlobMetadata[metadata={}]]
  Expected :BytesBlob[payload=[B@6bca6c4c, metadata=BlobMetadata[metadata={}]]
  Actual   :BytesBlob[payload=[B@507e3e77, metadata=BlobMetadata[metadata={}]]

 By default, java compares byte[] array by their reference, not the inner bytes.
@quantranhong1999
Copy link
Copy Markdown
Member

Hello,

I would like to help to move on this work.

I rebased this work on master, fixed the compilation, adapted review comments, and fixed some blob store tests.

Status: BlobStoreDAO's API refactoring + memory storage for blob's metadata + test contract for metadata should be ready. (Hopefully, we have a green build)

You can review the appended commits since the JAMES-4182 Fix compilation commit.

I propose to split the blob metadata storage for other implementations (S3, file, Cassandra, Postgres) into other PRs.

Also, should we fire a mailing list thread to draw the potential community's attention and feedback on this topic, if any?

@quantranhong1999 quantranhong1999 marked this pull request as ready for review April 8, 2026 08:28
@chibenwa
Copy link
Copy Markdown
Contributor Author

chibenwa commented Apr 8, 2026

Good with me!

Thanks for the help on this @quantranhong1999

@Arsnael
Copy link
Copy Markdown
Contributor

Arsnael commented Apr 8, 2026

@quantranhong1999
Copy link
Copy Markdown
Member

Is this related? https://ci-builds.apache.org/job/james/job/ApacheJames/job/PR-2960/4/testReport/

It seems YES. I will have a look tomorrow.

cf S3 object's metadata key is case-insensitive (normalized to lowercase)
@quantranhong1999
Copy link
Copy Markdown
Member

Green build.

Let's wait for @jeantil review on this :-).

@chibenwa
Copy link
Copy Markdown
Contributor Author

chibenwa commented Apr 9, 2026

+1 OK with me

quantranhong1999 pushed a commit to quantranhong1999/james-project that referenced this pull request Apr 10, 2026
@quantranhong1999
Copy link
Copy Markdown
Member

I propose to split the blob metadata storage for other implementations (S3, file, Cassandra, Postgres) into other PRs.

See WIP: #3001

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants