(from github.com/pcolmer)
We have a page on our wiki that fess cannot index because the page is too long. I’d like to configure the crawler to ignore that page.
The URL of the page is https://wiki.linaro.org/WordIndex
In the web crawling configuration, I’ve got the following:
URLs:
https://wiki.linaro.org/
Included URLs for crawling:
https://wiki.linaro.org/.*
Excluded URLs for crawling:
./.?.*
./..png
./..jpg
./..gif
./..ico
./..css
./..js
WordIndex
but the crawler is still trying to access that page. Do I need to add it to “Excluded URLs for indexing” as well?
Or have I got the syntax wrong?
Thanks.
Philip