Closed
Description
Describe the bug
Import android source . Found some Exception.
- opengrok: opengrok-1.7.30
- JDK: openjdk 11.0.15 2022-04-19
- OS: Ubuntu22.04
- Tomcat: apache-tomcat-10.0.18
To Reproduce
- install. java/tomcat/opengrok
- import
java -Xms16gm -Xmx32g -XX:PermSize=16g -XX:MaxPermSize=32g -jar /data1/sdk/tools/opengrok-1.7 .30/lib/opengrok.jar -c /usr/local/bin/ctags -s /data1/sdk/tools/opengrok-1.7.30/src -d /data1/sdk/tools /opengrok-1.7.30/data -H -P -S -G -v -W /data1/sdk/tools/opengrok-1.7.30/etc/configuration.xml -U http:/ /localhost:8080/source --depth 100000 --progress -m 8192
Expected behavior
success.
Additional context
opengrok-1.7.30/src/android-10.0.0_r41/external/cldr/common/collation/zh.xml
java.lang.IllegalArgumentException: Document contains at least one immense term in field="full" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[-27, -123, -103, -27, -123, -101, -27, -123, -98, -27, -123, -99, -27, -123, -95, -27, -123, -93, -27, -105, -89, -25, -109, -87, -25, -77, -114, -28, -72, -128]...', original message: bytes can be at most 32766 in length; got 39180
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:984)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:527)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:491)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:208)
at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:415)
at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1757)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1400)
at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:867)
at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$4(IndexDatabase.java:1361)
at java.base/java.util.stream.Collectors.lambda$groupingByConcurrent$59(Collectors.java:1304)
at java.base/java.util.stream.ReferencePipeline.lambda$collect$1(ReferencePipeline.java:575)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 39180
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:281)
at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:182)
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:974)
... 21 more