Problem with german umlauts in folders and files

I’ve been looking for solutions to this problem, but I can’t find a solution in the “Issues” section.
I use fess to search PDFs in a file system. In this file system there are folders an files with german umlauts like
folder: /var/tmp/fess/PDF/Häuser
files: /var/tmp/fess/PDF/Wohnungen/Leistungssätze.pdf
In the Failure URL I found the follow details:
org.codelibs.fess.exception.ContentNotFoundException: Not Found: file:/var/tmp/fess/PDF/H%EF%BF%BD%EF%BF%BDuser Parent: file:/var/tmp/fess/PDF
How can I crawl this folders and files?
Is this a problem of the fess crawler or elasticsearch?
What kind of logs an config data do you need to analyse this issue?
Thanks a lot

Is Häuser encoded by UTF-8 in your environment?

Yes, I think so. I use LANG=“de_DE.UTF-8” in my environment (Debian 9.7).
I can’t change the charset of the folder or of the file:
root@debian:/var/tmp/fess/PDF# file -bi Häuser
inode/directory; charset=binary
root@debian:/var/tmp/fess/PDF/Häuser# file -bi 40002889_4200_Leistungssätze.pdf
application/pdf; charset=binary
The programm convmv can’t convert from binary to utf8.
How can I change this to utf8? Or do I have to set an encoding in the configuration of fess?

Now I found a solution:
In /etc/fess/ I had to set the following parameters:

  • crawler.document.html.default.lang=de
  • crawler.document.file.default.lang=de

Every parameter was empty by default.
Fess searchs the folders an the files with german umlauts and I can search the documents.
Now I can test, if fess is able to crawl all the 4.8 TBs of pdf :smiley:
Thanks for your help!