Best practice for limit crawl time or number of url:s


I have sat up a crawler that should index ca 3500 pages. The pages are listed and the list pages are set to no-index. It takes the crawler 14 hours before it finishes with status fail

The last lines are:

2023-04-23 03:54:07,567 [IndexUpdater] INFO  Processing no docs (Doc:{access 5ms, cleanup 34ms}, Mem:{used 236MB, heap 512MB, max 512MB})
2023-04-23 03:54:07,570 [IndexUpdater] INFO  Terminating indexUpdater. emptyListCount is over 3600.
2023-04-23 03:54:07,575 [WebFsCrawler] INFO  [EXEC TIME] crawling time: 50040386ms

I can see that the same pages are beeing crawled ca 30 times during the period. Is there a way to limit the time for the crawler or set a maximum number of indexed pages?

I have set the Max Access Count to 5000.

Terminating indexUpdater. emptyListCount is over 3600.

It means that IndexUpdater is timed out.
If you have a lot of list pages, a crawler does not send documents to the index updater, and then it occurs.

If “3500 pages” is correct, it’s better to check fess-crawler.log, and you should update the crawler setting to remove unexpected pages.

If crawled pages are correct, you can change the timeout in


My urls are beeing crawled 30 times each. The 3500 pages that I want to have in the index are done in about 2 hours, the rest of the 12 hours it repeats them.

Is there a way to get the crawler to only axess an url once?

What does 3600 in indexer.webfs.max.empty.list.count=3600 mean? Is it time or number of url:s?

If i put the list pages in Excluded URLs For Crawling then the crawler will not axess the pages that are beeing listed, right? Now I have the list pages in Excluded URLs For Indexing

It’s better to set Included/Excluded URLs for Crawling/Indexing. It depends on your requirement.
Excluded URLs For Crawling means a crawler ignores the URLs.