(from github.com/goldenivan)
Hi,
I would like to crawl many websites. But, I have multiples contexts. I will describe them here.
I want to manage some website pools. For some examples, I have 4 websites : A, B, C and D. I want to see search results from websites A, B and C on Fess. I also want to crawl the D website. But, I don’t want to see his results mixed with A, B and C.
Can I separate website results ? Maybe there is something possible with nodes ?
Can I choose to have multiple search GUI for one Fess server, and choose which one is connected to a specific ES node ?
Do I need to use a specific instance to split data ?
(from github.com/marevol)
Use VirtualHosts.
Case: www1.example.com displays websites A, B, C and www2.example.com displays website D.
(You need to assign 2 hostnames to Fess)
- Set Virtual Hosts settgins at General
Host:www1.example.com=www1
Host:www2.example.com=www2
- Create 4 Web Crawling Configs with Virtual Hosts settings at Crawling Config
Site A: www1
Site B: www1
Site C: www1
Site D: www2
- Start Default Crawler
- Access www1.example.com and www2.example.com
www1.example.com displays only website ABC.
(from github.com/goldenivan)
Hi Marevol,
I used this configuration on my configuration :
- Virtual Hosts settings
Host:www1.fessurl.com=www1
Host:www2.fessurl.com=www2
- I added virtual host value for each sites
I can access to www1.fessurl.com and www2.fessurl.com. But, with both URL, I have all results. Should I use a specific port for each search GUI ? Should I have an apache reverse proxy configuration or something ?
(from github.com/marevol)
It depends on a request header.
Does your request contain a port number? You can check the request header in your request.
If you access it with a port, it’s
Host:www1.fessurl.com:8080=www1
Using Apache as a reverse proxy, the name of a request header is
X-Forwarded-Host:www1.fessurl.com=www1
(from github.com/goldenivan)
It worked perfectly for me, I forgot to write the port on each Host. Thank you for the answers.