Filename with ä, ö or ü

Hello there,

fess and opensearch work perfectly so far. However, I have noticed that file names containing ä, ö or ü cannot be found. It works fine if these are inside files, these are found without any problems. UTF-8 is set in the Fess configuration.

What else do I have to consider? Maybe something in opensearch? It would also be nice to index files that have ä, ö and ü in the file name.

Thanks in advance

To search for a file by its name, please use the filename prefix, such as filename:... .
If you want to include the filename in the content field, please ensure that crawler.document.append.filename is set to true in the fess_config.properties file."

Hello and thank you for the quick reply.
Unfortunately, the problem persists.

crawler.document.append.filename is set to true. As a test I created a folder with 4 files. 2 files contain an ö and an ä in the file name. When I search for “title: test”, the result is successful. As soon as I search for “title: hellö”, no success. “filename:” does not work at all.

Could this possibly be due to opensearch?
Unfortunately, I have not found anything more detailed about this in my research.

Could you provide an example file containing the character?

With pleasure. Attached is a zip file containing 3 test files.

  • testö.jpg
  • hellö.txt
  • test.txt

Search for “test” and “test.txt” brings up the result “test.txt”
Search for testö.txt or hellö.txt brings no results
same with the preface “title:”

Link:

I tried it, but it works for me. I could not reproduce the problem.
It might be better to check the indexed content on Admin Search page to further investigate the issue.

It’s good that it works for them, that gives me hope that it could be solved.

After several attempts and tests, I realized that in my test files the content with an “ö”, for example, is no longer read.

I have attached various screenshots, my config and logs. In the logs you can see, among other things, how it seems to have problems.

2023-12-11 10:47:30,320 [Crawler-20231211104727-1-2] INFO Crawling URL: file:/srv/dfs/xxx_test
2023-12-11 10:47:31,453 [Crawler-20231211104727-1-2] INFO Crawling URL: file:/srv/dfs/xxx_test/hell%EF%BF%BD%EF%BF%BD.txt
2023-12-11 10:47:31,811 [Crawler-20231211104727-1-5] INFO Crawling URL: file:/srv/dfs/xxx_test/test.txt
2023-12-11 10:47:31,813 [Crawler-20231211104727-1-1] INFO Crawling URL: file:/srv/dfs/xxx_test/test%EF%BF%BD%EF%BF%BD.jpg

I didn’t find anything suitable on the subject in the Fess documentation. At most “encoding” via UTF-8.

Link.

I have just come one step further.

In the file /usr/share/fess/bin/fess I have set the configuration for the variables LANG and LC_ALL from US to DE. Now I get search results for the files that contain an “ö,ä,ü” in the file name. Unfortunately this did not help for the content of the files.

I have just come one step further.

“br lle” should actually be “brülle”

Screenshot 2023-12-11 113210

In my environment, it works. brülle is displayed as a result.