(from github.com/DuncanFranklin)
I am a novice to Fess, the developer who set it up as our web search left and hasn’t been replaced.
One of the crawls we do is for events and the events are crawled and labelled as events if they appear on https://www.soas.ac.uk/util/eventscrawler but none of the events listed on this page are showing up. Any help would be gratefully received.
We exclude .events. from indexing and crawling in the main website crawl, this would cover all event items on the site, the settings for the events crawl are:
Type | org.codelibs.fess.crawler.exception.MaxLengthExceededException
Log | org.codelibs.fess.crawler.exception.MaxLengthExceededException: The content length (2643693 byte) is over 2621440 byte. The url is https://www.soas.ac.uk/util/eventscrawler
at org.codelibs.fess.crawler.client.http.HcHttpClient.processHttpMethod(HcHttpClient.java:765)
at org.codelibs.fess.crawler.client.http.HcHttpClient.doHttpMethod(HcHttpClient.java:623)
at org.codelibs.fess.crawler.client.http.HcHttpClient.doGet(HcHttpClient.java:582)
at org.codelibs.fess.crawler.client.AbstractCrawlerClient.execute(AbstractCrawlerClient.java:142)
at org.codelibs.fess.crawler.client.FaultTolerantClient.execute(FaultTolerantClient.java:67)
at org.codelibs.fess.crawler.CrawlerThread.run(CrawlerThread.java:164)
at java.lang.Thread.run(Thread.java:745)
(from github.com/DuncanFranklin)
Hi there,
modifying it in contentlength.xml worked, but now there is another error stopping things from being crawled. It looks like there are too many results to index.
What can I do to fix this - this is to label all of the events we have at university, the number of events will increase every day.
Would it be possible to create multiple versions of https://www.soas.ac.uk/util/eventscrawler one for each year so that there are far fewer events listed on each and have them all being labelled as events?
I am also stopping anything with /events/ being crawled by the main web crawler - this would block most events from being indexed but if I removed this I presume most events would be shown in the main search but obviously not with the events label. If I did this and fixed the events labelling would that work?
ID
gCXXkmgBl8hq3_2dB3u3
Thread name
URL
Crawler-20190130040000-19-8
Type | org.codelibs.fess.exception.InvalidQueryException
Log | org.codelibs.fess.exception.InvalidQueryException: Invalid query: {“size”:10681,“timeout”:“10000ms”,“query”:{“term”:{“parent_id”:{“value”:“https:%2F%2Fwww.soas.ac.uk%2Futil%2Feventscrawler;role=Rguest”,“boost”:1.0}}},"_source":{“includes”:[“url”],“excludes”:[]}}at org.codelibs.fess.es.client.FessEsClient.search(FessEsClient.java:703)at org.codelibs.fess.es.client.FessEsClient.getDocumentList(FessEsClient.java:784)at org.codelibs.fess.es.client.FessEsClient.getDocumentList(FessEsClient.java:758)at org.codelibs.fess.helper.IndexingHelper.getDocumentListByQuery(IndexingHelper.java:192)at org.codelibs.fess.helper.IndexingHelper.getChildDocumentList(IndexingHelper.java:179)at org.codelibs.fess.crawler.FessCrawlerThread.getChildUrlSet(FessCrawlerThread.java:235)at org.codelibs.fess.crawler.FessCrawlerThread.isContentUpdated(FessCrawlerThread.java:124)at org.codelibs.fess.crawler.CrawlerThread.run(CrawlerThread.java:155)at java.lang.Thread.run(Thread.java:745)Caused by: Failed to execute phase [query], all shards failed; shardFailures {[TRpYmpykSxSnrDpM3qsPrQ][fess.20180105][0]: RemoteTransportException[[master-data-node-1][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10681]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }{[TRpYmpykSxSnrDpM3qsPrQ][fess.20180105][1]: RemoteTransportException[[master-data-node-1][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10681]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }{[TRpYmpykSxSnrDpM3qsPrQ][fess.20180105][2]: RemoteTransportException[[master-data-node-1][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10681]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }{[TRpYmpykSxSnrDpM3qsPrQ][fess.20180105][3]: RemoteTransportException[[master-data-node-1][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10681]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }{[TRpYmpykSxSnrDpM3qsPrQ][fess.20180105][4]: RemoteTransportException[[master-data-node-1][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10681]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]; }at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:274)at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:132)at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:243)at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:107)at org.elasticsearch.action.search.InitialSearchPhase.access$100(InitialSearchPhase.java:49)at org.elasticsearch.action.search.InitialSearchPhase$2.lambda$onFailure$1(InitialSearchPhase.java:217)at org.elasticsearch.action.search.InitialSearchPhase.maybeFork(InitialSearchPhase.java:171…
Error Count | 2
Last Access | 1548824644523