missing files

(from rafael844 · GitHub)
Hi, Im trying to index 10850 files, but its only indexing 10618. What could be wrong? The crawler.log doenst show any error. For eaxmple, one file that is not indexing, in the crawer.log

2018-10-24 00:40:32,647 [Crawler-20181024000000-1-2] INFO Crawling URL: file:/G:/arquivos/10549/id_8852_Informacao_2017-346_Terceiros.pdf.p7s.pdf

In admin page, failure url dont show any error, the cvrawling info session 20181024000000 dont find this file.

(from github.com/marevol)
You can check debug logs.
See https://github.com/codelibs/fess/issues/1073#issuecomment-304397187

(from rafael844 · GitHub)
A piece of debug log…

2018-10-24 12:03:11,711 [IndexUpdater] DEBUG Indexing file:/G:/arquivos/10549/id_8852_Informacao_2017-346_Terceiros.pdf.p7s.pdf
2018-10-24 12:03:11,713 [IndexUpdater] DEBUG Click Count: 0, url: file:/G:/arquivos/10549/id_8852_Informacao_2017-346_Terceiros.pdf.p7s.pdf
2018-10-24 12:03:11,714 [IndexUpdater] DEBUG Favorite Count: 0, url: file:/G:/arquivos/10549/id_8852_Informacao_2017-346_Terceiros.pdf.p7s.pdf
2018-10-24 12:03:11,714 [IndexUpdater] DEBUG Added the document(4MB, 3ms). The number of a document cache is 8.
2018-10-24 12:03:11,714 [IndexUpdater] DEBUG The number of an added document is 4596.
.
.
2018-10-24 12:03:12,340 [IndexUpdater] DEBUG Removing file:/G:/arquivos/10549/id_8852_Informacao_2017-346_Terceiros.pdf.p7s.pdf from file:/G:/arquivos/10549/id_8852_Informacao_2017-346_Terceiros.pdf.p7s.pdf
2018-10-24 12:03:12,340 [IndexUpdater] DEBUG Thumbnail generator is not found: file:/G:/arquivos/10190/id_8446_Of%C3%ADcio.pdf

I saw in the debbug that its extracting the whole text, but I dont know why its not indexing.

(from github.com/rafael844)
Actually its indexing all files, but i dont know why i cant search by its name. For example, the file id_9142_Informacao_2018-027_Terceiros.pdf.p7s.pdf, I only can access if i search for an word inside the text.
When I try id_9142_*, id_9142_Informacao_2018-027_Terceiros.pdf.p7s.pdf, 9142, I dont get anything. is there a way to fix it?

(from github.com/marevol)
You can check indexed documents at Admin Search page.

(from github.com/rafael844)
Looking the admin search I could see that if I put on the search box “filename:id_4192*” it works right. But shouldnt it work without I need to set the filename ?

Even in the Select System Info > Search, if i dont set filename it doesnt bring the indexed file.

(from github.com/marevol)
Did you check field values in the indexed document?

(from github.com/rafael844)
Yes, the title is blank. It seems that it only search in title and content by default.

(from github.com/marevol)
I think PDFBox failed to parse it. The log messages were printed as Crawler log, not IndexUpdater.