Prefer index https when there is also http

(from github.com/rhayun)
Hi there,

my index show the same results (duplicate) when i have URL that accessible through https and http protocol, how can i “tell” fess to prefer index https only if already have two of the same?

can i controll the way fess create the “_id” field? i believe that the reason of this duplicates happen cause the id contain the protocol (http, https) so - http://example.com and https://example.com get different id and different doc_id - cause a duplication

thanks

(from github.com/marevol)
Fess supports a “canonical” tag, so I think it should be used.
If you cannot change target html pages, you can set http:.* to Excluded URLs For Crawling
.

(from github.com/rhayun)
thanks but i am not sure canonical is the right choice for this - canonical is basically more for: http://www.example.com/home is the same as http://www.example.com/frontpage - but not for protocols

in google you can “say” if you prefer to show results in https or http if there is same link result… i wish fess can resolve it also some how

(from github.com/marevol)
Another solution might be Path Mapping with Crawling process type.

Regexp.: http://example.com
Replacement: https://example.com

(from github.com/rhayun)
yes it can work but its mean i need to map each website

lets say i have many websites under one domain

http://example.com/site1
http://example.com/site2
http://example.com/site3
etc…

and http://example.com/site3 can be access through http and also https
but http://example.com/site2 and http://example.com/site1 is not

so i have to pathmap http://example.com/site3 only
and if there is http://example.com/site4 i also need to map it… and so on

when you have more then 600 subsites its hard to pathmap them

so i believe a better solution is tell fess (somehow) that is there is two same URL but different protocol so he need to show the prefer one in results - so if i prefer to show https than http
if there is two results
http://example.com/site1
https://example.com/site1
its need only show the https one

i will try to see how to do that but any help is welcome

thanks