I have some files hosted in windows IIS server and they have Japanese names. The names are garbled in FESS pages.
It seems they are url-encoded with Shift-JIS at crawl and decoded with UTF-8 at display time.
Do you have any solution, like setting config or anything, please ?
Could you provide steps to reproduce it in my environment?
Thank you for your replying.
I’m using fess 12.1 in Cent OS 7 with OpenJdk.
Each of the problem files has Japanese character name, and it is linked from SHIFT-JIS HTML or ASPX in IIS 5.
eq.) a href=“./files/健康診断申請.pdf”
I use Web crawler for the site with basic authentication and label setting.
After crawling , garbled file names appears in fess search result.
I will add some more detail on the next Wednesday (when I can see the environment)
Now I’ve understood you need the steps for your environment.
I tried to host a web server to provide the same situation for you, but I couldn’t. AWS S3 refuses Japanese character files nor apache2 didn’t return the proper file with shift-jis named files even though “AddDefaultCharset” is Off.
Could you give me some other advice? Logging any particular information while crawling could be useful?