Update to apache lucene 9.X by matthiasblaesing · Pull Request #9207 · apache/netbeans

matthiasblaesing · 2026-02-14T19:30:41Z

Additional package lucene-analysis-common required (KeywordAnalyzer, WhitespaceAnalyzer, PerFieldAnalyzerWrapper, CharTokenizer, LimitTokenCountAnalyzer)
FieldTypes were introduced to carry the field behavior (tokenization state, indexing options, storage settings)
BooleanQuery construction moved to builder
Collector interface was modified
TermEnum was replaced by TermsEnum
Terms are stored per field, so queries have to be specified with the target field
QuerySelectors were replaced by String sets
RAMDirecory was removed and is replaced by ByteBufferDirectories
Lock handling was reworked

lkishalmi · 2026-02-16T18:04:31Z

@matthiasblaesing This looks great! Thank you!

As far as I've tested. this generrally works, however there is a bug in the MemoryIndex that sometimes tries to search on a closed index.

lkishalmi · 2026-02-16T18:06:18Z

Also is it possible to upgrade the base JDK version to Java 21 for parsing.lucene module?

matthiasblaesing · 2026-02-16T18:12:52Z

Also is it possible to upgrade the base JDK version to Java 21 for parsing.lucene module?

I had an initial branch that did that (and bumped to lucene 10), but that will cause issues in the enterprise cluster, that runs with JDK17 for tests. @mbien gave the hint that lucene 9 allows to stay with the older java level.

As far as I've tested. this generrally works, however there is a bug in the MemoryIndex that sometimes tries to search on a closed index.

I'm seeing issues with LuceneIndex. There is a strange construction in LuceneIndex#doClear, which knocks away all locks on invocation. I think there is a race somewhere.

lkishalmi · 2026-02-16T19:15:33Z

I think I can solve the issue of Java 21 on Enterprise Tests. What I see the majority of the fails comes from the Micronaut integration. I had some clash with that lately.

I also see that my PR broke your paperwork as the sig file needs to be re-generated for parsing.lucene

matthiasblaesing · 2026-02-16T19:17:00Z

The micronaut problems are real. Real as in, I manually tested the case (started IDE with example project) and found completion broken. I'll look a bit further.

lkishalmi · 2026-02-17T04:17:25Z

@matthiasblaesing please rebase on master, @mbien fixed Micronaut tests with Java 21.

matthiasblaesing · 2026-02-17T20:58:23Z

The commits are rebased, but that does not seem to be a problem. I can reproduce the micronaut problem locally. The problem is not deterministic. I tested it by:

removing the test index folder used by the test rm -rf /tmp/nbcache/index/
run the micronaut tests
check the diffs from the test failures

I noticed changes in the packages there were not listed anymore. I then located the index folders for the jars and indeed the indices were at least partially broken.

That matches outputs in the unittests reporting: WARNUNG: Locked index folder: /tmp/nbcache/index/s8/java/15/refs. I'll stare a bit more at the LuceneIndex and its embedded DirCache. The lock schema feels off, but I can't point to the core problem yet.

matthiasblaesing · 2026-02-18T17:33:31Z

I found at least one problem with the updated code: the locks move from per-factory to global and that explains the seemingly random behavior.

matthiasblaesing · 2026-02-19T21:27:02Z

And I found the second round of problems:

ByteRefs are mutable and they are mutated when a TermsEnum is iterated. I.e. if the ByteRef is to be used after the TermsEnum moves to the next term, the old ByteRef might be modified. So a defensive copy is created now.
For class member search a case insensitive match is done. That works by transforming the token stream to lowercase. The old implementation does not work anymore and needs to be replaced by the LowerCaseFilter.

lkishalmi · 2026-02-22T18:45:15Z

I've done some testing today. The old $NETBEANS_CACHE_DIR/index should be removed if there is one exists before this change, otherwise there are a bunch of IndexFormatTooOldException would be thrown. Though that should not be an issue. This is just an FYI for others reading this PR.

MemoryIndex still throws an AlreadyClosedException, after a file has been modified and auto-completion has been invoked.

Other probably biassed (not measured in any way) experience: The indexing seems to be a bit slower than before, though the index search seems to be faster. Never had a completion when it was not feeling instant. Will test this on Raspberry Pi.

lkishalmi

I'm on the edge on getting this merged. Even when there are known issues, merging it allows more people to test/give feedback chance to debug.

matthiasblaesing · 2026-02-22T21:14:35Z

I've done some testing today. The old $NETBEANS_CACHE_DIR/index should be removed if there is one exists before this change, otherwise there are a bunch of IndexFormatTooOldException would be thrown. Though that should not be an issue. This is just an FYI for others reading this PR.

Yeah. I noticed that when I switch between 10 and 9 for testing. I'll see if I can up with a method to allow a smother up/downgrade (most probably this will mean: nuke the exiting index).

MemoryIndex still throws an AlreadyClosedException, after a file has been modified and auto-completion has been invoked.

Found the issue. Somewhere between multiple iterations I move to a try-with-resource setup and this was a remaining not reverted change. The readers in MemoryIndex and LuceneIndex are shared and must not be closed at the use-sites. Should be fixed now.

mbien

Updates look good to me and some parts are also similar to some of my update attempts which I never finished since something else always seemed to show up.

I ran some smoke testing scripts which mostly test initial index creation and everything worked well. The index size and indexing times were fairly similar to NB 29 which I somewhat expected since I think to remember that the bottle neck wasn't the lucene end last time I looked (edit: index creation is also only a small piece of the puzzle, queries would be interesting to benchmark). Didn't see any exceptions and everything seemed to work after indexing - great job!

Yeah. I noticed that when I switch between 10 and 9 for testing. I'll see if I can up with a method to allow a smother up/downgrade (most probably this will mean: nuke the exiting index).

technically we don't have to worry about this too much since the NB upgrade process does not copy the index. There is also the option for adding the lucene backward-codecs dependency - this was done for the maven indexer which does actually move the index if its fresh enough.

This allows to use old indices for a bit longer before they have to be reset. (this does not cover lucene 3 but it is something we might consider when we update lucene the next time)

left one comment inline

ide/parsing.lucene/src/org/netbeans/modules/parsing/lucene/RecordOwnerLockFactory.java

matthiasblaesing · 2026-02-25T17:14:11Z

Pushed an update:

bumped index versions for the java indexer that is nearer to the lucene index and the general indexing framework, that PHP, JS, CSS and the rest use (LuceneIndexFactory.java and JavaIndex.java)
integrated @mbien change
noticed a problem with JS indexing, that produces huge terms that can't be stored and implemented a work around (see LuceneIndex.java)

For comparison see:
https://github.com/apache/netbeans/compare/b9d392c3e630e3d8ddf236305e66e6f82b36d730..029628e796366e8f91835c666029bc50c1540ea8. The second force push is just a rebase to fix conflict with the merged signature files.

mbien · 2026-02-25T18:02:55Z

Ran the smoke test script again (opens and indexes ~850 NB modules) and everything worked fine without exceptions or indexer related warnings. Editor features worked after that too. Seems to be solid!

- Additional package lucene-analysis-common required (KeywordAnalyzer, WhitespaceAnalyzer, PerFieldAnalyzerWrapper, CharTokenizer, LimitTokenCountAnalyzer) - FieldTypes were introduced to carry the field behavior (tokenization state, indexing options, storage settings) - BooleanQuery construction moved to builder - Collector interface was modified - TermEnum was replaced by TermsEnum - Terms are stored per field, so queries have to be specified with the target field - QuerySelectors were replaced by String sets - RAMDirecory was removed and is replaced by ByteBufferDirectories - Lock handling was reworked - Remove StoppableConvertor from Lucene Support - Work around potentially excessivily large terms in index documents. (observed was a single `usage` entry for a JS document with about 40_000 characters) Co-authored-by: Laszlo Kishalmi <laszlo.kishalmi@gmail.com> Co-authored-by: Michael Bien <mbien42@gmail.com>

matthiasblaesing added this to the NB30 milestone Feb 14, 2026

matthiasblaesing added ci:all-tests [ci] enable all tests ci:dev-build [ci] produce a dev-build zip artifact (7 days expiration, see link on workflow summary page) labels Feb 14, 2026

matthiasblaesing force-pushed the update-lucene3 branch from 44d27c4 to 0c1baaa Compare February 14, 2026 20:22

matthiasblaesing marked this pull request as draft February 15, 2026 15:16

This was linked to issues Feb 17, 2026

Update Lucene 3.x editor dependency #8384

Open

Apache Lucene 3.6.2 critical vulnerability issue - CVE-2017-12629 #4697

Open

mbien added Upgrade Library Library (Dependency) Upgrade Editor labels Feb 17, 2026

matthiasblaesing force-pushed the update-lucene3 branch from b9cb304 to 4388561 Compare February 17, 2026 17:49

matthiasblaesing force-pushed the update-lucene3 branch from 20f6769 to 9a2e09d Compare February 20, 2026 18:46

matthiasblaesing marked this pull request as ready for review February 21, 2026 07:50

lkishalmi approved these changes Feb 22, 2026

View reviewed changes

matthiasblaesing force-pushed the update-lucene3 branch from 9a2e09d to b9d392c Compare February 22, 2026 20:49

mbien reviewed Feb 23, 2026

View reviewed changes

ide/parsing.lucene/src/org/netbeans/modules/parsing/lucene/RecordOwnerLockFactory.java Outdated Show resolved Hide resolved

matthiasblaesing force-pushed the update-lucene3 branch 2 times, most recently from 029628e to d752c8f Compare February 25, 2026 17:06

mbien approved these changes Feb 25, 2026

View reviewed changes

matthiasblaesing force-pushed the update-lucene3 branch from d752c8f to 8690430 Compare February 25, 2026 20:04

Conversation

matthiasblaesing commented Feb 14, 2026

Uh oh!

lkishalmi commented Feb 16, 2026

Uh oh!

lkishalmi commented Feb 16, 2026

Uh oh!

matthiasblaesing commented Feb 16, 2026

Uh oh!

lkishalmi commented Feb 16, 2026

Uh oh!

matthiasblaesing commented Feb 16, 2026

Uh oh!

lkishalmi commented Feb 17, 2026

Uh oh!

matthiasblaesing commented Feb 17, 2026

Uh oh!

matthiasblaesing commented Feb 18, 2026

Uh oh!

matthiasblaesing commented Feb 19, 2026

Uh oh!

lkishalmi commented Feb 22, 2026

Uh oh!

lkishalmi left a comment

Choose a reason for hiding this comment

Uh oh!

matthiasblaesing commented Feb 22, 2026

Uh oh!

mbien left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

matthiasblaesing commented Feb 25, 2026

Uh oh!

mbien commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mbien left a comment •

edited

Loading