Fess failed to craw file with spaces

INFO Failed to access to https://centernet.fredhutch.org/cn/e/contrib/2022/10/computational-biology-seminar-series---dr--alquiraishi-mohammed-/_jcr_content/eventdetails/attachment/file.res/CBP Seminar Flyer - Dr. Mohammed AlQuirashi .pdf; The url may not be valid: https://centernet.fredhutch.org/cn/e/contrib/2022/10/computational-biology-seminar-series---dr--alquiraishi-mohammed-/_jcr_content/eventdetails/attachment/file.res/CBP Seminar Flyer - Dr. Mohammed AlQuirashi .pdf;

Full disclaimer: If you paste those URLs into the browser’s URL bar, you prob won’t be able to access since those pages are only accessible within my org’s VPN. The fess server is also inside my org’s VPN so it can access those pages just fine.

BUT, as you can see, the HTML page is:
https://centernet.fredhutch.org/cn/e/contrib/2022/10/computational-biology-seminar-series---dr--alquiraishi-mohammed-.html.

In this page there’s an attachment file URL with space. My server’s backend DID actually encoded the URLs with space with character “%20”:

https://centernet.fredhutch.org/content/centernet/en/e/contrib/2022/11/oncologic-emergencies--part-2/_jcr_content/eventdetails/attachment/file.res/Oncologic%20Emergencies%20Part%202-%20Drechsler%20flyer.pdf

I don’t know why it got decoded when the crawler crawl those files. Any idea why?

Edit my original post:

  • My bad, my server backend DID actually encoded the URLs with space with character “%20”:
    https://centernet.fredhutch.org/content/centernet/en/e/contrib/2022/11/oncologic-emergencies--part-2/_jcr_content/eventdetails/attachment/file.res/Oncologic%20Emergencies%20Part%202-%20Drechsler%20flyer.pdf
    I don’t know why it got decoded when the crawler crawl those files.

Could you provide an example to reproduce it?

My apologies. After doing some deeeeeeep investigations, it turns out my encoded URL when being clicked on, will redirect the request to another un-encoded one, hence Fess failed. It’s not Fess’s fault, I’ll do some fixing/encoding for the 2nd (redirected) request.