Unable to index PDF, Word or Excel files

discuss · July 5, 2019, 4:19am

(from github.com/erbouchard)
I’m trying to figure out. My HTML pages are indexed properly but none of PDF, Word or Excel files.

Using version 12.6.

Crawling parameters

URLs
http://host/NPG/

Included URL For Crawling
http://host/NPG/.*

Excluded URLs For Crawling
.(?i).*(css|js|jpeg|jpg|gif|png|bmp|wmv|exe|mp4)

Included URLs For Indexing
http://host/NPG/.*

Excluded URLs For Indexing
(empty)

Questions

Is this supposed to index those documents (references in <a href="...">...</a>) by default?
Or do I have to configure something?

Thanks

discuss · July 5, 2019, 1:35pm

(from marevol (Shinsuke Sugaya) · GitHub)

Is this supposed to index those documents (references in …) by default?

Yes.

Or do I have to configure something?

See fess-crawler.log.

discuss · July 5, 2019, 3:20pm

(from github.com/erbouchard)
Here’s my fess-clawler.log for today. No trace of any of those files.

fess-crawler.log

discuss · July 6, 2019, 1:13am

(from github.com/marevol)
Which page is PDF file linked?