(from github.com/qmaxquique)
Thank you again for developing such a great tool!
I’m using the Fess 12.4.0
I can crawl and index several sites without any issues, but when I try to get this particular site Fess only gets a few URLs crawled (47 in total) and then it just ends the job as it would finished crawling the site.
I’ve tried with several crawling configurations but none of them seems to work.
This is the crawler configuration:
ID whiH6mkBbO5aJ2YtZEsC
Name store.repligen.com
URLs https://store.repligen.com/
Included URLs For Crawling https://store.repligen.com/.*
Excluded URLs For Crawling
Included URLs For Indexing
Excluded URLs For Indexing .*oembed
.*css
Config Parameters
Depth
Max Access Count
User Agent Mozilla/5.0 (compatible; Fess/12.4; +http://fess.codelibs.org/bot.html)
The number of Thread 3
Interval time 1200 ms
Boost 1.0
Permissions {role}www.repligen.com
Virtual Hosts
Status Enabled
Description
In the logs, I see the site has several sitemaps.
One of them has the product urls (this is what I want to index) and it seems to be generated dynamically for crawling purposes.
2019-04-05 01:10:31,470 [Crawler-whiH6mkBbO5aJ2YtZEsC-1-1] INFO Crawling URL: https://store.repligen.com/sitemap_products_1.xml?from=1675528208441&to=2143411404857
This sitemap xml file seems to be valid and well populated, as expected, but Fees is not processing it nor showing any errors.
For test purporses, I’ve changed the URL parameter in the crawler configuration to point directly to the sitemap shown above, however the results are the same.
Is there anything obviously wrong here?
Any clue or help is more than welcome!
Thank you.