Custom web page parsing

discuss · January 22, 2019, 7:31am

(from github.com/HeBuDuMkA-83)
Hello
Is there is a way to implement custom web page content parsing? I want to add some logic in that class, where Fess engine already got page content or DOM tree and want to place it into ElasticSearch storage. So I will determine which part of content will sent to storage depending of site url, for example.

discuss · January 22, 2019, 12:52pm

(from github.com/marevol)
See org.codelibs.fess.crawler.transfomer and fess-crawler.