Failed to start job File Crawler

discuss · December 25, 2017, 8:24pm

(from github.com/PakanAngel)
I did configure a Fess file crawler on Centos 7 that didn’t work at all. All files are in PDF format and are in a folder at root: /elasticsearch/repo but the crawler cannot find the folder. After changing some setups on the crawler now when I want to start the job, get the error: Failed to start job File Crawler.
1- Why do I get this message and cannot start the job?
2- What is wrong in my crawler setting that it refuses to index those files?

File Crawler Settings:
Name: Docs
Paths: file:/elasticsearch/repo
Depth: 3
Max Access Count: 1000
The Number of Thread: 1
Interval Time: 1000ms
Boost: 1.0
Permissions: {role}guest
Label: Docs
Status: Enabled

discuss · December 25, 2017, 9:12pm

(from github.com/marevol)
Did you check log files?
Is the directory permission correct?

discuss · December 25, 2017, 10:49pm

(from github.com/PakanAngel)
Here are 2 pictures showing the crawler.log for file crawling and the permission of the folder: /elasticsearch
Pictures:

discuss · December 25, 2017, 11:19pm

(from github.com/PakanAngel)
And this shows the error of the file crawler in URL Failure page.

discuss · December 26, 2017, 11:33am

(from github.com/PakanAngel)
I’ve also checked different file path syntax as below:

discuss · December 26, 2017, 1:32pm

(from github.com/marevol)
Does Fess process run by fess user? If yes, /elasticsearch/repo and /elasticsearch should be accessible by fess user.

discuss · December 26, 2017, 1:44pm

(from github.com/PakanAngel)
I think fess is running under a user called “fess”.

discuss · December 26, 2017, 1:54pm

(from github.com/PakanAngel)
I’ve changed the owner and permission of that folder to “fess” user, but still the same problem.
Can you check my crawler.log file to see what could cause this issue?
fess-crawler.log

discuss · December 26, 2017, 2:02pm

(from github.com/marevol)
2017-12-26 14:48:17,225 [Crawler-AWCO2Fvrr5lvujcK90kV-1-1] INFO Could not access file://elasticsearch/repo/

I do not think the permission is correct…

discuss · December 26, 2017, 2:05pm

(from github.com/PakanAngel)
This is the screenshot of my “repo” folder permission
screen shot 1396-10-05 at 15 04 28

discuss · December 26, 2017, 2:07pm

(from github.com/PakanAngel)
The parent folder which is called “elasticsearch” has also the same permissions
screen shot 1396-10-05 at 15 05 59

discuss · December 26, 2017, 2:11pm

(from github.com/PakanAngel)
I can also change my repo folder to another folder which Fess can access by default but which folder would be better for this case?

discuss · December 27, 2017, 11:25am

(from github.com/PakanAngel)
Finally solved. But don’t really know what caused the problem. I changed my file crawler folder to /usr/share/fess/bin which is the main Fess folder to see if I can perform any crawling or not. I realized that the crawler works. After a while I changed the crawl path to my initial repo. folder at /elasticsearch/repo/ and suddenly it crawled all files residing there. It is definite that any folder you want to use as a repository for file crawling should be set to fess user.