Possible to modify JSON data that is pulled?

discuss · April 21, 2017, 4:30pm

(from github.com/ajlaporte)
Is it possible to tell Fess what to pull into the JSON feed when it crawls? For example, it pulls in title and summary but can it pull in additional info like an author name when crawling a blog?

discuss · April 21, 2017, 10:12pm

(from github.com/marevol)
You can add/modify extraction rules.
To modify a rule for content field, edit the following setting in fess_config.properties:
(//BODY is XPath to extract a string under body tag)

crawler.document.html.content.xpath=//BODY

To add rules, add them in app/WEB-INF/classes/crawler/transformer.xml:

            <postConstruct name="addFieldRule">
                    <arg>"title"</arg><!-- field name in indexed document -->
                    <arg>"//TITLE"</arg><!-- XPath -->
            </postConstruct>