How to skip a specific page?

(from github.com/pcolmer)
We have a page on our wiki that fess cannot index because the page is too long. I’d like to configure the crawler to ignore that page.

The URL of the page is https://wiki.linaro.org/WordIndex

In the web crawling configuration, I’ve got the following:

URLs:
https://wiki.linaro.org/

Included URLs for crawling:
https://wiki.linaro.org/.*

Excluded URLs for crawling:
./.?.*
./..png
./..jpg
./..gif
./..ico
./..css
./..js
WordIndex

but the crawler is still trying to access that page. Do I need to add it to “Excluded URLs for indexing” as well?

Or have I got the syntax wrong?

Thanks.

Philip

(from marevol (Shinsuke Sugaya) · GitHub)

This Wiki service has been archived*

Try to remove the above setting.
Specifying both included and excluded urls, included urls wins.

(from github.com/pcolmer)
If I remove that setting, does fess default to crawling the base URL anyway?

(from github.com/marevol)
Oops, for crawling, Excluded URLs wins.
I think your setting is:

URLs:
https://wiki.linaro.org/

Included URLs for crawling:
https://wiki.linaro.org/.*

Excluded URLs for crawling:
.*\?.*
.*\.png
.*\.jpg
.*\.gif
.*\.ico
.*\.css
.*\.js

Excluded URLs for indexing: (if you want to crawl https://wiki.linaro.org/WordIndex)
.*/WordIndex.*