No results for crawling a page of links

(from github.com/DuncanFranklin)
I am a novice to Fess, the developer who set it up as our web search left and hasn’t been replaced.

One of the crawls we do is for events and the events are crawled and labelled as events if they appear on https://www.soas.ac.uk/util/eventscrawler but none of the events listed on this page are showing up. Any help would be gratefully received.

We exclude .events. from indexing and crawling in the main website crawl, this would cover all event items on the site, the settings for the events crawl are:

URLs | https://www.soas.ac.uk/util/eventscrawler

Included URLs For Crawling | https://www.soas.ac.uk/.*

Excluded URLs For Crawling |
..mp3..min.css...css..css...js..jpg..jpeg..JPEG..JPG..png..xml..rss.-rss-...json..ico.*.zip

Included URLs For Indexing |

Excluded URLs For Indexing | https://www.soas.ac.uk/util/eventscrawler..mp3..min.css...css..css...js..jpg..jpeg..JPEG..JPG..png..xml..rss.-rss-...json..ico.*.zip

Config Parameters |

Depth | 1

Max Access Count |

User Agent | Mozilla/5.0 (compatible; Fess +SOAS Events)

The number of Thread | 8

Interval time | 1000 ms

Boost | 1.0

Permissions | {role}guest

Virtual Hosts |

Label | Events SOAS Website

Status | Enabled

(from github.com/marevol)
Check fess-crawler.log.

(from github.com/DuncanFranklin)
Searching for event in the fess-crawler.log I found these lines

2019-01-22 04:00:07,008 [WebFsCrawler] INFO Excluded URL: .events.
2019-01-22 04:00:07,456 [WebFsCrawler] INFO Target URL: https://www.soas.ac.uk/util/eventscrawler
2019-01-22 04:00:07,458 [WebFsCrawler] INFO Excluded URL from failures: \Qhttps://www.soas.ac.uk/util/eventscrawler\E

Do you know what might be wrong?

(from github.com/marevol)
Remove logs in Failure URL.

(from github.com/DuncanFranklin)
I deleted the logs in Failure URL.

The error for this page says

ID mCQ_fmgBl8hq3_2dl26o
URL https://www.soas.ac.uk/util/eventscrawler

Thread name | Crawler-20190124040000-19-7

Type | org.codelibs.fess.crawler.exception.MaxLengthExceededException

Log | org.codelibs.fess.crawler.exception.MaxLengthExceededException: The content length (2643693 byte) is over 2621440 byte. The url is https://www.soas.ac.uk/util/eventscrawler
at org.codelibs.fess.crawler.client.http.HcHttpClient.processHttpMethod(HcHttpClient.java:765)
at org.codelibs.fess.crawler.client.http.HcHttpClient.doHttpMethod(HcHttpClient.java:623)
at org.codelibs.fess.crawler.client.http.HcHttpClient.doGet(HcHttpClient.java:582)
at org.codelibs.fess.crawler.client.AbstractCrawlerClient.execute(AbstractCrawlerClient.java:142)
at org.codelibs.fess.crawler.client.FaultTolerantClient.execute(FaultTolerantClient.java:67)
at org.codelibs.fess.crawler.CrawlerThread.run(CrawlerThread.java:164)
at java.lang.Thread.run(Thread.java:745)

Error Count | 1

Last Access | 1548306323368

What should I do next?

(from marevol (Shinsuke Sugaya) · GitHub)

The content length (2643693 byte) is over 2621440 byte.

The size of the crawled file is too large.
To change the upper bound, you can modify it in contentlength.xml.

(from github.com/DuncanFranklin)
Hi there,
modifying it in contentlength.xml worked, but now there is another error stopping things from being crawled. It looks like there are too many results to index.
What can I do to fix this - this is to label all of the events we have at university, the number of events will increase every day.
Would it be possible to create multiple versions of https://www.soas.ac.uk/util/eventscrawler one for each year so that there are far fewer events listed on each and have them all being labelled as events?
I am also stopping anything with /events/ being crawled by the main web crawler - this would block most events from being indexed but if I removed this I presume most events would be shown in the main search but obviously not with the events label. If I did this and fixed the events labelling would that work?

ID gCXXkmgBl8hq3_2dB3u3
Thread name URL

Crawler-20190130040000-19-8
Type | org.codelibs.fess.exception.InvalidQueryException
Log | org.codelibs.fess.exception.InvalidQueryException: Invalid query: {“size”:10681,“timeout”:“10000ms”,“query”:{“term”:{“parent_id”:{“value”:“https:%2F%2Fwww.soas.ac.uk%2Futil%2Feventscrawler;role=Rguest”,“boost”:1.0}}},"_source":{“includes”:[“url”],“excludes”:[]}}at org.codelibs.fess.es.client.FessEsClient.search(FessEsClient.java:703)at org.codelibs.fess.es.client.FessEsClient.getDocumentList(FessEsClient.java:784)at org.codelibs.fess.es.client.FessEsClient.getDocumentList(FessEsClient.java:758)at org.codelibs.fess.helper.IndexingHelper.getDocumentListByQuery(IndexingHelper.java:192)at org.codelibs.fess.helper.IndexingHelper.getChildDocumentList(IndexingHelper.java:179)at org.codelibs.fess.crawler.FessCrawlerThread.getChildUrlSet(FessCrawlerThread.java:235)at org.codelibs.fess.crawler.FessCrawlerThread.isContentUpdated(FessCrawlerThread.java:124)at org.codelibs.fess.crawler.CrawlerThread.run(CrawlerThread.java:155)at java.lang.Thread.run(Thread.java:745)Caused by: Failed to execute phase [query], all shards failed; shardFailures {[TRpYmpykSxSnrDpM3qsPrQ][fess.20180105][0]: RemoteTransportException[[master-data-node-1][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10681]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }{[TRpYmpykSxSnrDpM3qsPrQ][fess.20180105][1]: RemoteTransportException[[master-data-node-1][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10681]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }{[TRpYmpykSxSnrDpM3qsPrQ][fess.20180105][2]: RemoteTransportException[[master-data-node-1][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10681]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }{[TRpYmpykSxSnrDpM3qsPrQ][fess.20180105][3]: RemoteTransportException[[master-data-node-1][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10681]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }{[TRpYmpykSxSnrDpM3qsPrQ][fess.20180105][4]: RemoteTransportException[[master-data-node-1][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10681]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:274)at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:132)at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:243)at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:107)at org.elasticsearch.action.search.InitialSearchPhase.access$100(InitialSearchPhase.java:49)at org.elasticsearch.action.search.InitialSearchPhase$2.lambda$onFailure$1(InitialSearchPhase.java:217)at org.elasticsearch.action.search.InitialSearchPhase.maybeFork(InitialSearchPhase.java:171…
Error Count | 2
Last Access | 1548824644523

(from github.com/marevol)
See https://stackoverflow.com/questions/41677198/result-window-is-too-large-from-size-must-be-less-than-or-equal-to-10000-b