(from github.com/ArthurBV )
Hello,
Is it possible to crawl wikipedia using the fess crawler? I have reduced the boost and interval time since wikipedia has some restrictions. But I haven’t been able to crawl their sites (only the main page is crawled and indexed)
Thanks!
(from github.com/marevol )
Need more info… ex. what is your crawling configs?
(from github.com/ArthurBV )
Here is the configuration:
I tried putting on “Included URLs For Crawling”:
https://es.wikipedia.org/wiki/.* but it didn’t work either.
Also I created this job to schedule the crawling:
discuss
February 10, 2018, 1:02am
4
(from github.com/marevol )
Interval time is too long.
I tried it and wikipedia pages were indexed.
discuss
February 10, 2018, 1:34am
5
(from github.com/ArthurBV )
What interval time are you using?
discuss
February 10, 2018, 1:49am
6
(from github.com/marevol )
To check it in my environment, settings are:
URL: https://es.wikipedia.org/wiki/
Include URL: https://es.wikipedia.org/wiki/.*
Interval time: 1000
Max Access Count: 10
discuss
February 13, 2018, 12:35am
7
(from github.com/ArthurBV )
Thanks a lot, everything appears to be working correctly.