Is there a setting to control how often a document needs to be re-crawled (in case there is no Expires header)?
in case there is no Expires header
Fess does not use Expires header.
Does it mean that all the URLs are retrieved every time the schedule is started?
It depends on your requirement…
If you have many updated urls, you can also use CsvListDataStore in Data Store Crawling.
CsvListDataStore crawls urls by using CSV file which contains urls.
The sample script which creates test data is csvlistdatastore.sh.
Ok, so the Web Crawler has no such setting. Clear, thanks.