Encoding problem

(from github.com/ozgengures)
Hi,

fess is good with utf-8 characters but my some pages has windows-1254 characters . when i crawling this page im getting “�” characters in result. i tried with apache tika result is same. how can i crawl this pages?

(from github.com/marevol)
Is it in HTLM file? I need more info…

(from github.com/ozgengures)

Is it in HTLM file? I need more info…

im trying web crawler on php pages, im already tried modify fess_config.properties file crawler.document.site.encoding=UTF-8 and crawler.crawling.data.encoding=UTF-8 to Cp1254, windows-1254 but its not enough

(from github.com/marevol)
Fess uses an encoding from a response header and a meta tag.
I think your php pages may return UTF-8 in the response header.

(from github.com/ozgengures)
Thank you, i realize now my page meta tag and response header are different values.
meta tag :
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
response header :
Content-Type: text/html; charset=iso-8859-9

Browsers can handle this confusion, i think Fess crawler using meta tag. (or meta tag overriding response header value)

one more question, can i describe this encoding priority(e.g.only use response headers content type charset)

(from github.com/marevol)
It’s meta tag > response header.
see HtmlTransformer.

(from github.com/ozgengures)
Thanks for your support.