Labels - included paths not working


I have some problems getting the labels attached to some specific paths while using the labels “Include Paths” option.

I have one file crawler to crawl a windows share with the following settings:

File Crawler
Name Root-Crawler
Path smb://root/

the rest are the default values.

And then I have created multiple Labels to label some folders under ROOT.

Name Label-A
Value labela
Include Paths smb://root/a/*
Name Label-B
Value labelb
Include Paths smb://root/b/*
Name Label-C
Value labelc
Include Paths smb://root/c/*

I’d asssume that the subfolders /root/a /root/b /root/c get labeld with their corresponding defined labels but nothing happens. It seems like that the labels “Included Paths” option is ignored for any crawled document.

Regarding to the issue #990 I also tried another regex for the “Included Paths” like smb://root/a/.* with no luck.

Any Ideas how to use the labels with just a single crawler?

The regexp is Java’s format. So, smb://root/c/.* is correct.
The actual crawled paths are printed in fess-crawler.log.
Fess checks if the path matches Include Paths.

Hi @marevol ,
thanks for the quick reply.
I’ve changed the regexp to the correct form like smb://root/c/.* and checked the crawled urls in the fess-crawler.log. The URL matches, but no labels are assigned to the crawled documents.

I have a special character in the URL – the $ sign. Is it possible that this is causing the problem to match the URLs?
For example:

Inclued Paths regexp Crawled-URL
smb://root/d$/.* smb://root/d$/test.txt

Did you try escaped $?

Hi @marevol ,

thanks for helping. After escaping the special characters in the java regexp the labels are matched and assigned correctly. This is really powerful, great job.

Just for the future…where can I find the url/label matching logic in the sourcecode ?

See LabelTypeHelper.
If you need more development supports, please contact support team.