I have some problems getting the labels attached to some specific paths while using the labels “Include Paths” option.
I have one file crawler to crawl a windows share with the following settings:
File Crawler
Name
Root-Crawler
Path
smb://root/
the rest are the default values.
And then I have created multiple Labels to label some folders under ROOT.
Label-A
Name
Label-A
Value
labela
Include Paths
smb://root/a/*
Label-B
Name
Label-B
Value
labelb
Include Paths
smb://root/b/*
Label-C
Name
Label-C
Value
labelc
Include Paths
smb://root/c/*
I’d asssume that the subfolders /root/a/root/b/root/c get labeld with their corresponding defined labels but nothing happens. It seems like that the labels “Included Paths” option is ignored for any crawled document.
Regarding to the issue #990 I also tried another regex for the “Included Paths” like smb://root/a/.* with no luck.
Any Ideas how to use the labels with just a single crawler?
(from github.com/marevol)
The regexp is Java’s format. So, smb://root/c/.* is correct.
The actual crawled paths are printed in fess-crawler.log.
Fess checks if the path matches Include Paths.
(from github.com/jmike72)
Hi @marevol ,
thanks for the quick reply.
I’ve changed the regexp to the correct form like smb://root/c/.* and checked the crawled urls in the fess-crawler.log. The URL matches, but no labels are assigned to the crawled documents.
I have a special character in the URL – the $ sign. Is it possible that this is causing the problem to match the URLs?
For example:
thanks for helping. After escaping the special characters in the java regexp the labels are matched and assigned correctly. This is really powerful, great job.
Just for the future…where can I find the url/label matching logic in the sourcecode ?