(from github.com/ozgengures)
Hi,
fess is good with utf-8 characters but my some pages has windows-1254 characters . when i crawling this page im getting “�” characters in result. i tried with apache tika result is same. how can i crawl this pages?
(from github.com/ozgengures)
Hi,
fess is good with utf-8 characters but my some pages has windows-1254 characters . when i crawling this page im getting “�” characters in result. i tried with apache tika result is same. how can i crawl this pages?
(from github.com/ozgengures)
Is it in HTLM file? I need more info…
im trying web crawler on php pages, im already tried modify fess_config.properties file crawler.document.site.encoding=UTF-8 and crawler.crawling.data.encoding=UTF-8 to Cp1254, windows-1254 but its not enough
(from github.com/marevol)
Fess uses an encoding from a response header and a meta tag.
I think your php pages may return UTF-8 in the response header.
(from github.com/ozgengures)
Thank you, i realize now my page meta tag and response header are different values.
meta tag :
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
response header :
Content-Type: text/html; charset=iso-8859-9
Browsers can handle this confusion, i think Fess crawler using meta tag. (or meta tag overriding response header value)
one more question, can i describe this encoding priority(e.g.only use response headers content type charset)
© 2020. All Rights Reserved - CodeLibs, Inc.