We have configured fess to index only our company website. But it is also crawling the external domains like microsoft.com.
Have posted the sample config below. Please let me know, what’s the issue
Name Test
URLs https://www.abc.com/en-us/index.html
Included URLs For Crawling https://www.abc.com/.*
Excluded URLs For Crawling
Included URLs For Indexing https://www.abc.com/.*
Excluded URLs For Indexing
(.*)/assets/.*
(.*)/www/.*
Config Parameters config.html.canonical.xpath=
field.xpath.lastModified=//META[@name="lastmodified"]/@content
field.xpath.releaseDate=//META[@name="releaseDate"]/@content
Depth 10
Max Access Count 50000
User Agent Mozilla/5.0 (compatible; Fess/13.10;
+http://fess.codelibs.org/bot.html)
The number of Thread 1
Interval time 10000 ms
Boost 1.0
Permissions {role}guest
Status Enabled
Description
Included URLs For Crawling https://www.abc.com/.*
Excluded URLs For Crawling
(.*)/assets/.*
(.*)/www/.*
Included URLs For Indexing
Excluded URLs For Indexing
Something similar is happening to me as well. I included only a specific url for crawling but the crawler is going to other (external) urls… what are we doing wrong?
I am on 14.5.0