■ファイルクロールの設定
・ファイルシステム
パス:smbで別途ファイルサーバー指定
クロール対象から除外するパス:exe等もろもろ
深さ:指定なし
→指定先のディレクトリの階層はかなり上位
ディレクトリの深さはかなり深い
ファイル数、数十万件
最大アクセス数:6000
間隔:1000 ミリ秒
クローラーの実行間隔:30分毎に実行
上記設定の場合、クローラー初回実行時は6000件ファイルをちゃんとクロールしてインデックスを作成してくれますが、2回目以降は20~200程度しかインデックスが作成?されていません。
0件の時もありました。
クローラーが起動する度に、6000件ずつ新しくクロールしたいのですが、上記設定は誤りでしょうか。
ちなみにログには以下の警告が出ておりました。
こちらが原因の場合、解決策をご存知でしたらご教授ください。
■fess.log
2020-05-13 08:10:15,599 [pool-6-thread-1] WARN Failed to process a task.
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [<http_request>] would be [1065807384/1016.4mb], which is larger than the limit of [1020054732/972.7mb], real usage: [1065806824/1016.4mb], new bytes reserved: [560/560b], usages [request=0/0b, fielddata=506084/494.2kb, in_flight_requests=560/560b, accounting=1281113/1.2mb]]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177) ~[elasticsearch-7.6.2.jar:7.6.2]
at org.codelibs.elasticsearch.client.action.HttpAction.toElasticsearchException(HttpAction.java:138) ~[elasticsearch-httpclient-7.6.2.jar:?]
at org.codelibs.elasticsearch.client.action.HttpSearchAction.lambda$execute$0(HttpSearchAction.java:48) ~[elasticsearch-httpclient-7.6.2.jar:?]
at org.codelibs.curl.CurlRequest.lambda$execute$4(CurlRequest.java:220) ~[curl4j-1.2.4.jar:?]
at org.codelibs.curl.CurlRequest.lambda$connect$3(CurlRequest.java:199) ~[curl4j-1.2.4.jar:?]
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1429) ~[?:?]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) ~[?:?]
at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1016) ~[?:?]
at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1665) ~[?:?]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1598) ~[?:?]
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177) ~[?:?]
Suppressed: org.elasticsearch.ElasticsearchException: hits is null.
at org.codelibs.elasticsearch.client.action.HttpSearchAction.lambda$execute$0(HttpSearchAction.java:48) ~[elasticsearch-httpclient-7.6.2.jar:?]
at org.codelibs.curl.CurlRequest.lambda$execute$4(CurlRequest.java:220) ~[curl4j-1.2.4.jar:?]
at org.codelibs.curl.CurlRequest.lambda$connect$3(CurlRequest.java:199) ~[curl4j-1.2.4.jar:?]
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1429) ~[?:?]
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) ~[?:?]
at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1016) ~[?:?]
at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1665) ~[?:?]
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1598) ~[?:?]
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177) ~[?:?]
Suppressed: org.codelibs.elasticsearch.client.action.HttpAction$CurlResponseException: {“error”:{“root_cause”:[{“type”:“circuit_breaking_exception”,“reason”:“[parent] Data too large, data for [<http_request>] would be [1065807384/1016.4mb], which is larger than the limit of [1020054732/972.7mb], real usage: [1065806824/1016.4mb], new bytes reserved: [560/560b], usages [request=0/0b, fielddata=506084/494.2kb, in_flight_requests=560/560b, accounting=1281113/1.2mb]”,“bytes_wanted”:1065807384,“bytes_limit”:1020054732,“durability”:“PERMANENT”}],“type”:“circuit_breaking_exception”,“reason”:“[parent] Data too large, data for [<http_request>] would be [1065807384/1016.4mb], which is larger than the limit of [1020054732/972.7mb], real usage: [1065806824/1016.4mb], new bytes reserved: [560/560b], usages [request=0/0b, fielddata=506084/494.2kb, in_flight_requests=560/560b, accounting=1281113/1.2mb]”,“bytes_wanted”:1065807384,“bytes_limit”:1020054732,“durability”:“PERMANENT”},“status”:429}
よろしくお願いいたします。