INFO Failed to access to https://centernet.fredhutch.org/cn/e/contrib/2022/10/computational-biology-seminar-series---dr--alquiraishi-mohammed-/_jcr_content/eventdetails/attachment/file.res/CBP Seminar Flyer - Dr. Mohammed AlQuirashi .pdf; The url may not be valid: https://centernet.fredhutch.org/cn/e/contrib/2022/10/computational-biology-seminar-series---dr--alquiraishi-mohammed-/_jcr_content/eventdetails/attachment/file.res/CBP Seminar Flyer - Dr. Mohammed AlQuirashi .pdf;
Full disclaimer: If you paste those URLs into the browser’s URL bar, you prob won’t be able to access since those pages are only accessible within my org’s VPN. The fess server is also inside my org’s VPN so it can access those pages just fine.
BUT, as you can see, the HTML page is:
https://centernet.fredhutch.org/cn/e/contrib/2022/10/computational-biology-seminar-series---dr--alquiraishi-mohammed-.html
.
In this page there’s an attachment file URL with space. My server’s backend DID actually encoded the URLs with space with character “%20”:
https://centernet.fredhutch.org/content/centernet/en/e/contrib/2022/11/oncologic-emergencies--part-2/_jcr_content/eventdetails/attachment/file.res/Oncologic%20Emergencies%20Part%202-%20Drechsler%20flyer.pdf
I don’t know why it got decoded when the crawler crawl those files. Any idea why?