For these two:
wiki-all-6-3-tamber-bm25
wikipedia-dpr-100w-bm25
Both in src/main/resources/reproduce/from-document-collection/configs/
The reproductions call out to Pyserini, which errors when we upgraded to Lucene 10, e.g.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/tuna1/scratch/jimmylin/pyserini/pyserini/eval/convert_trec_run_to_dpr_retrieval_run.py", line 48, in <module>
searcher = LuceneSearcher(args.index)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tuna1/scratch/jimmylin/pyserini/pyserini/search/lucene/_searcher.py", line 50, in __init__
self.object = JSimpleSearcher(index_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "jnius/jnius_export_class.pxi", line 285, in jnius.JavaClass.__init__
File "jnius/jnius_export_class.pxi", line 403, in jnius.JavaClass.call_constructor
File "jnius/jnius_utils.pxi", line 79, in jnius.check_exception
jnius.JavaException: JVM exception occurred: java.lang.IllegalArgumentException: indexCreatedVersionMajor is in the future: 10
java.lang.IllegalArgumentException: indexCreatedVersionMajor is in the future: 10
org.apache.lucene.index.SegmentInfos.<init>(SegmentInfos.java:180)
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:363)
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:304)
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:88)
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:77)
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:820)
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:109)
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:67)
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:60)
io.anserini.search.SimpleSearcher.<init>(SimpleSearcher.java:132)
io.anserini.search.SimpleSearcher.<init>(SimpleSearcher.java:114)
Exception in thread "main" java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Command failed: python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run --index indexes/lucene-index.wiki-all-6-3-tamber/ --topics dpr-nq-test --input runs/run.index.wiki-all-6-3-tamber.dpr-nq-test.bm25.txt --output runs/run.index.wiki-all-6-3-tamber.dpr-nq-test.bm25.txt.json --combine-title-text
at io.anserini.reproduce.ReproduceFromDocumentCollection.runCommandsInThreadPool(ReproduceFromDocumentCollection.java:753)
at io.anserini.reproduce.ReproduceFromDocumentCollection.run(ReproduceFromDocumentCollection.java:320)
at io.anserini.reproduce.ReproduceFromDocumentCollection.main(ReproduceFromDocumentCollection.java:195)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Command failed: python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run --index indexes/lucene-index.wiki-all-6-3-tamber/ --topics dpr-nq-test --input runs/run.index.wiki-all-6-3-tamber.dpr-nq-test.bm25.txt --output runs/run.index.wiki-all-6-3-tamber.dpr-nq-test.bm25.txt.json --combine-title-text
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at io.anserini.reproduce.ReproduceFromDocumentCollection.runCommandsInThreadPool(ReproduceFromDocumentCollection.java:751)
... 2 more
Caused by: java.lang.RuntimeException: Command failed: python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run --index indexes/lucene-index.wiki-all-6-3-tamber/ --topics dpr-nq-test --input runs/run.index.wiki-all-6-3-tamber.dpr-nq-test.bm25.txt --output runs/run.index.wiki-all-6-3-tamber.dpr-nq-test.bm25.txt.json --combine-title-text
at io.anserini.reproduce.ReproduceFromDocumentCollection.runCommand(ReproduceFromDocumentCollection.java:731)
at io.anserini.reproduce.ReproduceFromDocumentCollection.lambda$runCommandsInThreadPool$0(ReproduceFromDocumentCollection.java:741)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1583)
We would like to eliminate this dependence.
For these two:
wiki-all-6-3-tamber-bm25wikipedia-dpr-100w-bm25Both in
src/main/resources/reproduce/from-document-collection/configs/The reproductions call out to Pyserini, which errors when we upgraded to Lucene 10, e.g.
We would like to eliminate this dependence.