Crawl only whitelisted filetypes

Hello,

is there a way to whitelist filetypes in a filesystem crawler? In other words:

FESS works great for the time invested into the initial setup. But it seems the crawler walks through everything, including files no Extractor is give for. For example: It tries to load an executable, but fails to do only because of its file size.

We have set up a file system crawler with a path, as well as two paths to avoid as excluded paths for crawling and indexing. Including is empty for both, crawling and indexing. So i would add *.pdf$ for example, to get only PDF kind of files. Right?

In general docs would benefit of more config examples in plain english.

Kind regards,
Udo

A filetype is decided after downloading a file. So, Fess cannot use it as a filter for crawling.
Therefore, you need to use the setting of included/excluded paths.
If you want to crawl pdf files, I think the configuration is as below:
Crawling included path:

.*/$
.*\.pdf$

Indexing included path:

.*\.pdf$
1 Like

Added this to my file crawling config.

Since i think this will take effect for sessions to come, I will look forward and wait for things to happen.

Thanks for your fast response.