Deleted file index

(from github.com/jungo-tera)
Hi,

I deleted a file. I assumed this file wouldn’t be searched after crawling. But actually this file was found in search result. It seems like index is not updated so fess returns that its existing. I try to delete index and run crawling. So this file wasnt not searched.

Is there any way that index will be forcefully updated while crawling after delete files?

Thanks!

(from github.com/marevol)
If Incremental crawling is enabled and deleted files are detected, they will be deleted.
To delete other deleted files, you need to set proper TTL.

(from github.com/Satoshi-Japan)
Hi,
I’m struggling with the exactly the same situation with jungo-tera.
“hard to understand” situation on the FESS 10.3. Because, on the Windows Server platform on AWS,

  1. only one PDF file deletion, after crawling, the PDF related index is deleted immediately after the first crawl
  2. one day, three PDF files deletion, after crawling, the PDF related indexs aren’t deleted after crawls
    (this is reproducible on the other Windows physical machine)
    I set the “General configuration -> Crawler -> Check Last Modified as CHECKED”, yes, it means “Enable incremental crawling”

Is someone know any other additional condition to be set or additional settings to be configured?
I’m responsible for the above production system of FESS 10.3.

Thanks and best rgrs.

(from github.com/marevol)
Does the incremental crawling run before expired date(default is 3 days)?

(from github.com/Satoshi-Japan)
Yes, exactly.
FESS crawling is scheduled at every 60 min all the day on the page of FESS Job Scheduler Configuration.
And expired date 1 day is specified, which is specified at “Remove Documents Before”.

(from github.com/marevol)
Could you check if expires is updated after each crawling jobs?

curl -XPOST <eshost:port>/fess.search/_search -d '
{
  "query": {
    "match_all": {}
  },
  "size": 100,
  "_source": [
    "doc_id",
    "expires",
    "timestamp",
    "last_modified",
    "url"
  ],
  "sort": [
    {
      "doc_id": {
        "order": "asc"
      }
    }
  ]
}'

(from github.com/jmike72)
I got the same problem to undertsand the delete policy correctly, but I can confirm that
in my case the expires date is updated for each document after crawling.
I have for performance reasons “Check last modified” turned on.

Do I understand the workflow right?

  • Fess deletes documents from the search index only when the expires date is reached?
  • After each crawling fess updates the expires date and extends the expires (TTL) field for each crawled document from “NOW” plus “Remove Documents before X Days”?
  • The expired and overdue documents are deleted when the crawler runs?

Some background informations:
I’m using the file crawler to crawl and index windows shared documents (smb://*) and I’m currently using Fess 11.3.0.

(from marevol (Shinsuke Sugaya) · GitHub)

After each crawling fess updates the expires date and extends the expires (TTL) field for each crawled document from “NOW” plus “Remove Documents before X Days”?

The above is “if check last modified is enabled”.

The expired and overdue documents are deleted when the crawler runs?

and deleted documents are also deleted.

(from github.com/Satoshi-Japan)
Thanks everybody,
From the reason, we don’t have enough time of examination.
We decided to chose forced deletion of the undeleted index when a corresponding document deleted, by means of using Umbraco’s index delete API make worked after the document delete bottom.
Appreciating your kindly advises.

(from github.com/atbay)
Hi,

We suffering from the same situation.
After all, What is the solution to this problem?

regards,

(from github.com/Satoshi-Japan)
We are stacked to circumvent the current situation using the FESS index delete API. We have two redundant FESS configurations. For the case of API error condition when a user select the delete button, we cannot decide how the message to be pop up or any action to be done internal of the system for the further recovery by an administrator.

(from github.com/atbay)
Thanks for replying.

Can not update the index with “incremental crawling”?
If possible, I would like to renew with ‘incremental crawling’ instead of manual maintenance.

regards,

(from github.com/Satoshi-Japan)
No, we cannot update with the “incremental crawling”. Sometimes missed to delete indexes, for example 3 files deleted btwn the crawling.
We already tried at the operational environment with “incremental crawring” with the FESS general settings of “Check Last Modified: enabled”.