Keep data forever

(from github.com/geodawg)
What is the correct configuration to keep all the data the web crawler collects? Data seems to be purging from the main index in random amounts of time. I’ve tried setting the crawler to -1 days, 0 days and then 365 days. The documents all seem to purge at the same time. I want to keep everything even the pages that have been modified. How should I handle this?

(from github.com/marevol)

  1. Disable Check Last Modifed
  2. Set -1 to Remove Documents Before

That’s it.

(from github.com/geodawg)
Thank you! That fixed it…

(from github.com/micakovic)
Is this setting going to remove documents that no longer exist from the index? Will this not leave many broken links in the index?

(from github.com/marevol)
“Remove Documents Before” is TTL.
It depends on your requirement.
TTL=-1 does not remove documents in the index even if they are removed on a file system.

(from github.com/micakovic)
What would be the correct setting to keep all documents which exist (web crawlers, or file system crawler), regardless of how old they are, but remove documents which have disappeared in the meantime?