Hi @all,
is it possible to define multiple crawlers with different includes excludes on the same path?
The background why we would like to split this is the mass of files to crawl & index.
We previously had a single job that included the entire path but it kept running into an OOM error and didn’t even run through without problems.
For this reason the question is now if the following construct works e.g.:
Job a:
Path: smb://mysrv/data/
Includes Crawling: ^(.?(?:(A|a)_ ?(A|a).).$
Excludes: Crawling ..(?i)(db|tmp|lnk|inf)
Includes Indexing: .*.(?i)(pdf|xlsx|xlsm|xls|pptx|pptm|ppt|docx|docm|doc|rtf|vsd|odt|csv|txt)
Job b:
Path: smb://mysrv/data/
Includes Crawling: ^(.?(?:(B|b)_ ?(B|b).).$
Excludes: Crawling ..(?i)(db|tmp|lnk|inf)
Includes Indexing: .*.(?i)(pdf|xlsx|xlsm|xls|pptx|pptm|ppt|docx|docm|doc|rtf|vsd|odt|csv|txt)
Currently it looks like only one definition is working - the jobs are running but do not show any crawled or indexed documents.
Or any other suggestion how to crawl large fileservers?
Thanks!