Configure DataStore Crawler for S3 Bucket

I’m trying to set up a crawler to index documents from S3 bucket, but I’m constantly getting this error:

software.amazon.awssdk.services.s3.model.S3Exception: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint. (Service: S3, Status Code: 301, …)

The crawler configuration is shown in the image below:

Can you please help me understand what I’m doing wrong?
Thanks

It occurs if the specified region does not match one of S3 bucket. So, you need to set a proper region.

From fess-ds-s3 14.2.1, “buckets” parameter is available. Therefore you can filter target files by region and bucket.

But the region is correct; to confirm it, I tried to download some files from the bucket, using the python library “boto3”, with the same configuration (region, access_key and secret_key), which worked. So I think the region is correct.
I added buckets=fess-test-documents to the parameters, but it still doesn’t work.

In my environment, it works. So, it’s better to check fess-crawler.log.

Okay, these are the logs:

2022-06-20 09:05:21,766 [main] INFO  Starting Crawler..
2022-06-20 09:05:29,068 [pool-6-thread-1] INFO  Crawling URL: https://amplify-jarikoamplify-dev-161329-deployment.s3-eu-central-1.amazonaws.com/%23current-cloud-backend.zip
2022-06-20 09:05:29,083 [__GXbIEBq_BLXIuYmM8d-1] INFO  Crawling URL: https://amplify-jarikoamplify-dev-161329-deployment.s3-eu-central-1.amazonaws.com/amplify-cfn-templates/hosting/template.json
2022-06-20 09:05:33,079 [pool-6-thread-1] INFO  Crawling URL: https://amplify-jarikoamplify-dev-161329-deployment.s3-eu-central-1.amazonaws.com/amplify-cfn-templates/auth/jarikoamplify1a21c48c-cloudformation-template.json
2022-06-20 09:05:33,136 [__GXbIEBq_BLXIuYmM8d-1] INFO  Crawling URL: https://amplify-jarikoamplify-dev-161329-deployment.s3-eu-central-1.amazonaws.com/amplify-meta.json
2022-06-20 09:05:33,319 [__GXbIEBq_BLXIuYmM8d-1] INFO  Crawling URL: https://amplify-jarikoamplify-dev-161329-deployment.s3-eu-central-1.amazonaws.com/backend-config.json
2022-06-20 09:05:33,378 [pool-6-thread-1] INFO  Crawling URL: https://amplify-jarikoamplify-dev-161329-deployment.s3-eu-central-1.amazonaws.com/amplify-cfn-templates/storage/cloudformation-template.json
2022-06-20 09:05:33,535 [pool-6-thread-1] INFO  Crawling URL: https://amplify-jarikoamplify-dev-161329-deployment.s3-eu-central-1.amazonaws.com/root-cloudformation-stack.json
2022-06-20 09:05:33,608 [__GXbIEBq_BLXIuYmM8d-1] ERROR Failed to process a data crawling: DataStore - AWS S3
software.amazon.awssdk.services.s3.model.S3Exception: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint. (Service: S3, Status Code: 301, Request ID: 1MB1G4269GEBY4BK, Extended Request ID: qSMakedsS1aBMqlifb7lj9FmovXYzc6wdAfSIjAw+f+YTGZxWBUPHhIOSZWc/GMdYbaRZ5vmuC8=)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:106) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:84) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:42) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:94) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$4(BaseClientHandler.java:214) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:77) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:39) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:64) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:34) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:189) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:121) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:147) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:101) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:55) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.services.s3.DefaultS3Client.listObjectsV2(DefaultS3Client.java:4867) ~[fess-ds-s3-14.0.0.jar:?]
	at software.amazon.awssdk.services.s3.S3Client.listObjectsV2(S3Client.java:10500) ~[fess-ds-s3-14.0.0.jar:?]
	at org.codelibs.fess.ds.s3.AmazonS3Client.getObjects(AmazonS3Client.java:126) ~[fess-ds-s3-14.0.0.jar:?]
	at org.codelibs.fess.ds.s3.AmazonS3DataStore.lambda$crawlBuckets$2(AmazonS3DataStore.java:171) ~[fess-ds-s3-14.0.0.jar:?]
	at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
	at java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1092) ~[?:?]
	at org.codelibs.fess.ds.s3.AmazonS3Client.getBuckets(AmazonS3Client.java:118) ~[fess-ds-s3-14.0.0.jar:?]
	at org.codelibs.fess.ds.s3.AmazonS3DataStore.crawlBuckets(AmazonS3DataStore.java:167) ~[fess-ds-s3-14.0.0.jar:?]
	at org.codelibs.fess.ds.s3.AmazonS3DataStore.storeData(AmazonS3DataStore.java:148) ~[fess-ds-s3-14.0.0.jar:?]
	at org.codelibs.fess.ds.AbstractDataStore.store(AbstractDataStore.java:121) ~[classes/:?]
	at org.codelibs.fess.helper.DataIndexHelper$DataCrawlingThread.process(DataIndexHelper.java:216) [classes/:?]
	at org.codelibs.fess.helper.DataIndexHelper$DataCrawlingThread.run(DataIndexHelper.java:202) [classes/:?]
2022-06-20 09:05:33,612 [pool-6-thread-1] WARN  Crawling Access Exception at : {expires=2022-06-23T09:05:20.970+0200, role=[Rguest], config_id=D--GXbIEBq-BLXIuYmM8d, created=2022-06-20T09:05:23.770+0200, segment=__GXbIEBq_BLXIuYmM8d, boost=9.9999998E10, mimetype=application/datastore, _id=46ff70126f976033e3f08d5a06c2cf03dd940149124ddcb1f64061c40401442b97a66e3ce4cfe03f1dafc159f638b41a4141393fc6793d451ac2297125aefbd7, title=root-cloudformation-stack.json, url=https://amplify-jarikoamplify-dev-161329-deployment.s3-eu-central-1.amazonaws.com/root-cloudformation-stack.json, virtual_host=[]}
java.lang.IllegalStateException: Future got interrupted
	at org.opensearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:94) ~[opensearch-1.2.4.jar:1.2.4]
	at org.opensearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:74) ~[opensearch-1.2.4.jar:1.2.4]
	at org.opensearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:68) ~[opensearch-1.2.4.jar:1.2.4]
	at org.opensearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:58) ~[opensearch-1.2.4.jar:1.2.4]
	at org.codelibs.fess.es.log.allcommon.EsAbstractBehavior.delegateSelectCountUniquely(EsAbstractBehavior.java:109) ~[classes/:?]
	at org.dbflute.bhv.AbstractBehaviorReadable.doSelectCountUniquely(AbstractBehaviorReadable.java:159) ~[dbflute-runtime-1.2.5.jar:?]
	at org.dbflute.bhv.AbstractBehaviorReadable.facadeSelectCount(AbstractBehaviorReadable.java:154) ~[dbflute-runtime-1.2.5.jar:?]
	at org.codelibs.fess.es.log.bsbhv.BsClickLogBhv.selectCount(BsClickLogBhv.java:99) ~[classes/:?]
	at org.codelibs.fess.helper.SearchLogHelper.getClickCount(SearchLogHelper.java:222) ~[classes/:?]
	at org.codelibs.fess.ds.callback.IndexUpdateCallbackImpl.addClickCountField(IndexUpdateCallbackImpl.java:173) ~[classes/:?]
	at org.codelibs.fess.ds.callback.IndexUpdateCallbackImpl.store(IndexUpdateCallbackImpl.java:98) ~[classes/:?]
	at org.codelibs.fess.ds.s3.AmazonS3DataStore.storeObject(AmazonS3DataStore.java:228) ~[fess-ds-s3-14.0.0.jar:?]
	at org.codelibs.fess.ds.s3.AmazonS3DataStore.lambda$crawlBuckets$0(AmazonS3DataStore.java:172) ~[fess-ds-s3-14.0.0.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1081) ~[?:?]
	at org.opensearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:251) ~[opensearch-1.2.4.jar:1.2.4]
	at org.opensearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:77) ~[opensearch-1.2.4.jar:1.2.4]
	at org.opensearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:89) ~[opensearch-1.2.4.jar:1.2.4]
	... 15 more
2022-06-20 09:05:35,327 [__GXbIEBq_BLXIuYmM8d-1] INFO  Sent 6 docs (Doc:{process 650ms, send 490ms, size 12KB}, Mem:{used 151MB, heap 392MB, max 512MB})
2022-06-20 09:05:36,283 [__GXbIEBq_BLXIuYmM8d-1] INFO  Deleted 0 old docs.
2022-06-20 09:05:36,285 [DataStoreCrawler] INFO  [EXEC TIME] crawling time: 12864ms
2022-06-20 09:05:36,293 [main] INFO  Finished Crawler
2022-06-20 09:05:36,428 [main] INFO  [CRAWL INFO] DataCrawlExecTime=12864,DataCrawlEndTime=2022-06-20T09:05:36.293+0200,CrawlerEndTime=2022-06-20T09:05:36.294+0200,DataIndexExecTime=650,CrawlerStatus=true,CrawlerStartTime=2022-06-20T09:05:21.772+0200,DataIndexSize=6,CrawlerExecTime=14522,DataCrawlStartTime=2022-06-20T09:05:23.098+0200
2022-06-20 09:05:36,431 [main] INFO  Disconnected to http://localhost:9200
2022-06-20 09:05:36,434 [main] INFO  Destroyed LaContainer.

at org.codelibs.fess.ds.s3.AmazonS3Client.getObjects(AmazonS3Client.java:126) ~[fess-ds-s3-14.0.0.jar:?]

You use Fess 14.0, not fess-ds-s3 14.2.1.

So, as I understand it, the problem is with my version of FESS.
I am using FESS 14.0, but to use the S3 crawler, I need FESS 14.2.1
Can you please confirm?

Thank you

Please install Fess 14.2.0 and fess-ds-s3 14.2.1.
We provide fixes on only the latest version as OSS product support, other than security ones.
If you need support on old versions, please contact Commercial Support.

I installed FESS 14.2 with fess-ds-s3 14.2.1, and it works :star_struck:
Thank you