How to re-index Failure URLS only ( SMB FS Crawler)


I crawled 6TB of PDF files (which took forever) and there were some errors.
The Failure URL section shows 10K files that have samba errors.
Is there a way to re-index the files in the ‘Failure URL’ log without having to re-crawl the entire SMB share?


I think you need to create a script to get URLs from Failure URL logs by Admin API and crawl them by CSV list crawling.

I was able to use this to export a CSV: (GitHub - pteich/elastic-query-export: 🚚 Export Data from ElasticSearch to CSV/JSON using a Lucene Query (e.g. from Kibana) or a raw JSON Query string)
There needs to be a better way to handle failed URLs as an admin.
It would be nice if I could create a re-crawl job from the Failure URL page.