I’ve been looking for solutions to this problem, but I can’t find a solution in the “Issues” section.
I use fess to search PDFs in a file system. In this file system there are folders an files with german umlauts like
In the Failure URL I found the follow details:
org.codelibs.fess.exception.ContentNotFoundException: Not Found: file:/var/tmp/fess/PDF/H%EF%BF%BD%EF%BF%BDuser Parent: file:/var/tmp/fess/PDF
How can I crawl this folders and files?
Is this a problem of the fess crawler or elasticsearch?
What kind of logs an config data do you need to analyse this issue?
Thanks a lot
Häuser encoded by UTF-8 in your environment?
Yes, I think so. I use LANG=“de_DE.UTF-8” in my environment (Debian 9.7).
I can’t change the charset of the folder or of the file:
root@debian:/var/tmp/fess/PDF# file -bi Häuser
root@debian:/var/tmp/fess/PDF/Häuser# file -bi 40002889_4200_Leistungssätze.pdf
The programm convmv can’t convert from binary to utf8.
How can I change this to utf8? Or do I have to set an encoding in the configuration of fess?
Now I found a solution:
In /etc/fess/fess_config.properties I had to set the following parameters:
Every parameter was empty by default.
Fess searchs the folders an the files with german umlauts and I can search the documents.
Now I can test, if fess is able to crawl all the 4.8 TBs of pdf
Thanks for your help!