Autosuggest very rare words

(from github.com/freestyle68)
Hi,

very rare words aren’t suggested. There is a way to set the minimum occurrence of a word to be suggested and eventually set this minimum to one?

(from github.com/marevol)
In fess_config.properties:

suggest.min.hit.count=1

(from github.com/freestyle68)
It doesn’t happen in my environment: if I try also in http://search.n2sm.co.jp with “regular” or “mapping” or “administration” why these words aren’t suggested?

(from github.com/marevol)
Did you run Suggester job?

(from github.com/freestyle68)
Of course.

To have a reproducible environment try to index the attached pdfs on zip package: why the terms “kozinski”, “silverman”, “Pregerson” aren’t suggested? The are existing on docs!

samples.zip

(from github.com/marevol)
The following parameters limit the number of documents to process it. To use all documents, change them in fess_config.properties:

suggest.update.contents.limit.num.percentage=100%
suggest.update.contents.limit.num=-1

(from github.com/freestyle68)
Before your tip: 6228 docs from suggest index.
After: 6349 docs.

But still missing some terms (the same of the last message and others).

(from github.com/marevol)
Documents which contain over 50000 characters are ignored.
In a next release, the limit will be changed. #1576

(from github.com/freestyle68)
thanks @marevol for this great improvement!

(from github.com/freestyle68)
Still problem with some documents. For example the following terms aren’t suggested in attached samples:

bower, cirillo, richter, sussex, bunn, derek, interdependency, etc

My config:

suggest.update.contents.limit.num.percentage=100%
suggest.update.contents.limit.num=-1
suggest.update.contents.limit.doc.size=500000000

My Fess version: https://github.com/codelibs/fess/tree/1aa6da600ca9a7ecd5f4aeb335d8210b542286c1

Samples:
test suggest.zip

(from github.com/freestyle68)
For example the attached doc 064452.pdf has 28000 words only, but suggest don’t happen for words after two-thirds of the document. It seems the setting

suggest.update.contents.limit.doc.size=500000000

is not picked by Fess.
It’s not pdf related, because if I export the above pdf as a txt and index it, the words in final part aren’t suggested.

064452.pdf

(from github.com/marevol)
The cause might be suggest_analyzer.json.

      "limit_token_count_filter": {
        "type": "limit",
        "max_token_count": 10000
      },

(from github.com/freestyle68)
It works perfectly!

Thank you @marevol