Exclude certain Parts

(from github.com/remoblaser)
Hey!

Is it possible to exclude certain parts from the Crawler? Like defining a part of the HTML Code which shouldn’t be crawled?

Or can i hide certain HTML Elements by setting them to display: none / removing them with JS if the visitor has the crawler’s User Agent?

(from github.com/marevol)
Try to change the following setting in fess_config.properties:

crawler.document.html.pruned.tags=noscript,script,style,header,footer,nav,a[rel=nofollow]

(from github.com/remoblaser)
Awesome! So this means i could define a div like this?:

crawler.document.html.pruned.tags=div[data-index="noindex"]

(from github.com/marevol)
Remove "

div[data-index=noindex]

(from github.com/remoblaser)
Awesome! Thank you very much!

(from github.com/TkTech)
Hello @marevol, I was looking for this exact same thing. Where does one find the available options for crawlers? Can this be passed in from the crawler settings instead of globally in the fess_config.properties?