Ingest Node preprocessor

(from github.com/charles-pinkston)
Is there any way to add an ingest note pipeline? I have some custom metadata fields that Fess is reading in properly, but I need to be able to split them into an array.

(from github.com/marevol)
Need more info…

(from github.com/puecher)
It would be great to split the “content” field into an array (new keyword datatype field called “content.keywords”). Any hints?

(from github.com/marevol)
You can edit app/WEB-INF/classes/fess_indices/fess/doc.json.
fess index is created at first time startup.

(from github.com/puecher)
Thank you for the info @marevol. How do I split the “content” field by single space (String[] keywords = content.split(" ")) and store the values in the new “content.keywords” field?

I would like to get a term cloud per index/alias (Term Vectors unfortunately support only 1+ documents).

(from github.com/charles-pinkston)
In my case, I had a bunch of values that were separated by a ‘|’ character in a customized field.

In order to make it work, I added a new pattern tokenizer, and a custom analyzer to use that in the fess.json file. Then in my doc.json, I changed my field to type text, fielddata true, and used the new analyzer. Not sure if that would work for you, but it might give you a couple of other ideas to work with.

(from github.com/puecher)
Hey, thanks for the info. Are you able to generate a top 10 tag cloud (aggs) per index? How can we preprocess the data before indexing?