(from goldenivan · GitHub)
Hi,
I am trying to crawl some files. I am using the version 12.1.0 of Fess. There is the configuration I have used :
In Crawler => File System
- Name : MyFileCrawl
- Paths : file:///tmp/crawl/
- Included Paths For Crawling : file:///tmp/crawl/
The folder crawl has those rights :
drwxr-xr-x 7 root root 66 Mar 30 15:40 crawl
So, I configured a connection in the File Authentication :
- Scheme : Samba
- Username : root
- Password : aBeautifulPassword
I configured this file system config to the file system created previously. Then, I create a job into the scheduler. I launch the crawl job. Then, I have this output in the fess-crawler.log :
2018-03-30 16:06:08,706 [WebFsCrawler] INFO Connected to localhost:9300
2018-03-30 16:06:08,863 [WebFsCrawler] INFO Target Path: file:///tmp/crawl/
2018-03-30 16:06:08,864 [WebFsCrawler] INFO Included Path: file:///tmp/crawl/
2018-03-30 16:06:09,094 [Crawler-aJ_9dmIB4EJ67hIPdwJb-1-3] INFO Crawling URL: file:///tmp/crawl/
2018-03-30 16:06:18,909 [IndexUpdater] INFO Processing no docs (Doc:{access 6ms}, Mem:{used 99MB, heap 153MB, max 495MB})
2018-03-30 16:06:28,893 [IndexUpdater] INFO Processing no docs (Doc:{access 4ms}, Mem:{used 101MB, heap 153MB, max 495MB})
2018-03-30 16:06:38,892 [IndexUpdater] INFO Processing no docs (Doc:{access 3ms}, Mem:{used 102MB, heap 153MB, max 495MB})
2018-03-30 16:06:40,229 [WebFsCrawler] INFO [EXEC TIME] crawling time: 31653ms
2018-03-30 16:06:48,892 [IndexUpdater] INFO Processing no docs (Doc:{access 4ms}, Mem:{used 102MB, heap 153MB, max 495MB})
2018-03-30 16:06:48,892 [IndexUpdater] INFO [EXEC TIME] index update time: 34ms
2018-03-30 16:06:48,942 [main] INFO Finished Crawler
In the folder /tmp/crawl, I have some folders containing files, to test the file crawling. But, as you can see in logs, there is nothing crawled. I tried to adapt the file path value, using one slash, double slash, triple slash. Only the last solution ask me that Fess start to crawl the URL.
Am I missing something ?