Is it possible to tell Fess what to pull into the JSON feed when it crawls? For example, it pulls in title and summary but can it pull in additional info like an author name when crawling a blog?
You can add/modify extraction rules.
To modify a rule for content field, edit the following setting in fess_config.properties:
(//BODY is XPath to extract a string under body tag)
To add rules, add them in app/WEB-INF/classes/crawler/transformer.xml:
<postConstruct name="addFieldRule"> <arg>"title"</arg><!-- field name in indexed document --> <arg>"//TITLE"</arg><!-- XPath --> </postConstruct>