Crawling wikipedia?

(from github.com/ArthurBV)
Hello,

Is it possible to crawl wikipedia using the fess crawler? I have reduced the boost and interval time since wikipedia has some restrictions. But I haven’t been able to crawl their sites (only the main page is crawled and indexed)

Thanks!

(from github.com/marevol)
Need more info… ex. what is your crawling configs?

(from github.com/ArthurBV)
Here is the configuration:


I tried putting on “Included URLs For Crawling”: https://es.wikipedia.org/wiki/.* but it didn’t work either.

Also I created this job to schedule the crawling:

(from github.com/marevol)
Interval time is too long.
I tried it and wikipedia pages were indexed.

(from github.com/ArthurBV)
What interval time are you using?

(from github.com/marevol)
To check it in my environment, settings are:

URL: https://es.wikipedia.org/wiki/
Include URL: https://es.wikipedia.org/wiki/.*
Interval time: 1000
Max Access Count: 10

(from github.com/ArthurBV)
Thanks a lot, everything appears to be working correctly.