Hello,
I’m currently testing the differential crawl and I think I’m missing something.
First of all, let me give you some background information:
- i’m running Fess 14.16
- configuration is as below :
- I’m testing a very limited number of files (5)
- I found this thread : Deleted file index
My problem is that I want documents to be removed from the index only when the crawler is no longer able to find the document.
For example, the file C:\Test\test.txt has been deleted, so it should be removed from the index at the next crawl.
The problem is that it doesn’t seem to work like that, even though that’s what this post seems to indicate (maybe I’m wrong). : Deleted file index - #8 by discuss
I’ve checked the “expires” and it’s updated for new and updated documents, but not for deleted documents, which seems fine to me.
My question is: is what I want to achieve possible? im i missing something ?
Thank you for your help and your amazing work
Best regards
In the latest version, the Scheduler’s Doc Purger deletes expired documents from the index based on the ‘Remove Documents Before’ value.
Hello Shinsuke,
Thank you for your reply, but that’s not really what I’m asking. I’ve deactivated the Doc Purger task because I don’t want the documents to “Expire”.
What I’m asking is whether the File crawler task I’ve created should delete deleted documents from the index.
For example: During the first crawl, 5 documents are indexed and that’s great, then I delete 1 document and restart the crawler, the crawler only finds 4 documents and that’s good, but there are still 5 documents in the index, the Crawler doesn’t delete the document deleted from the index.
Is there a way to do that ?
Thank you.
Best regards.
The crawler does not delete indexed documents because it needs to verify their existence. If you do not want to use the Document Purger, you must use the CsvListDataStore, create a file list containing the updated or deleted file paths, and pass it to the datastore crawler.
Ok got it, i will test that very soon.
I’ve had a look at the documentation and have a question, in the case of a folder rename, can I pass the line below into the csv file and will fess understand this as ‘remove all files with the old folder name from the index’ and ‘add all files with the new folder name to the index’?
delete,smb://servername/oldfoldername/*
create,smb://servername/newfoldername/*
Thank you.
Best regards.
Fess supports only file paths, not folders. Wildcards are not supported.