Ingest Node preprocessor

discuss · December 27, 2017, 4:26pm

(from github.com/charles-pinkston)
Is there any way to add an ingest note pipeline? I have some custom metadata fields that Fess is reading in properly, but I need to be able to split them into an array.

discuss · December 27, 2017, 9:13pm

(from github.com/marevol)
Need more info…

discuss · July 22, 2018, 7:32pm

(from github.com/puecher)
It would be great to split the “content” field into an array (new keyword datatype field called “content.keywords”). Any hints?

discuss · July 22, 2018, 8:17pm

(from github.com/marevol)
You can edit app/WEB-INF/classes/fess_indices/fess/doc.json.
fess index is created at first time startup.

discuss · July 23, 2018, 6:17am

(from github.com/puecher)
Thank you for the info @marevol. How do I split the “content” field by single space (String[] keywords = content.split(" ")) and store the values in the new “content.keywords” field?

I would like to get a term cloud per index/alias (Term Vectors unfortunately support only 1+ documents).

discuss · July 23, 2018, 2:53pm

(from github.com/charles-pinkston)
In my case, I had a bunch of values that were separated by a ‘|’ character in a customized field.

In order to make it work, I added a new pattern tokenizer, and a custom analyzer to use that in the fess.json file. Then in my doc.json, I changed my field to type text, fielddata true, and used the new analyzer. Not sure if that would work for you, but it might give you a couple of other ideas to work with.

discuss · July 25, 2018, 2:32pm

(from github.com/puecher)
Hey, thanks for the info. Are you able to generate a top 10 tag cloud (aggs) per index? How can we preprocess the data before indexing?