What is the correct configuration to keep all the data the web crawler collects? Data seems to be purging from the main index in random amounts of time. I’ve tried setting the crawler to -1 days, 0 days and then 365 days. The documents all seem to purge at the same time. I want to keep everything even the pages that have been modified. How should I handle this?
Thank you! That fixed it…
Is this setting going to remove documents that no longer exist from the index? Will this not leave many broken links in the index?
“Remove Documents Before” is TTL.
It depends on your requirement.
TTL=-1 does not remove documents in the index even if they are removed on a file system.
What would be the correct setting to keep all documents which exist (web crawlers, or file system crawler), regardless of how old they are, but remove documents which have disappeared in the meantime?