(from github.com/Claybird)
Hello,
Is there any way to use a synonym list that contains single character words, such as [アメリカ,米]?
With single character words, some documents are missed by the crawler with errors.
I know that synonym must contain more than one characters on the default configuration, as documented in https://fess.codelibs.org/ja/13.2/admin/synonym-guide.html .
So, one easy and simple way is to remove single character words.
I’m looking for another solution.
Details:
I took the following steps with 13.2.1 + AdoptOpenJDK11-hotspot-10.0.4+11 on Windows 10:
- Configured JAVA_HOME so that Fess starts with built-in ES.
- I uploaded synonyms.txt. The content is
アメリカ,米
only. - Waited for some minutes, making sure synonyms.txt is updated in the ES’s configuration directory.
- Performed Reindex on the maintenance page, without “reset dictionaries”.
- Crawled test.txt. The content is the first 7 lines of Wikipedia article .
- After a few minutes, crawler finishes, but no data is added. I found a error in
fess-crawler.log:
2019-08-10 12:30:25,790 [IndexUpdater] WARN Failed to access data. Retry to access… 1
org.codelibs.fess.es.client.FessEsClientException: failure in bulk execution:
[0]: index [fess.201908101230], type [_doc], id [b0780afd21868df9161d70f86228a0d8a9af709c1edfe112cf2e56eeecfd5e1da06ba363e170f2851bf0f591979f0f1872bec9d8a463d7504a83ee9c7bc03c1e], message [ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=128,endOffset=133,lastStartOffset=129 for field ‘content’]]]
at org.codelibs.fess.es.client.FessEsClient.addAll(FessEsClient.java:1009) ~[classes/:?]
at org.codelibs.fess.helper.IndexingHelper.sendDocuments(IndexingHelper.java:74) ~[classes/:?]
at org.codelibs.fess.indexer.IndexUpdater.run(IndexUpdater.java:234) [classes/:?]
Thanks