discuss
September 11, 2019, 8:00pm
1
(from github.com/markcamos )
I need to exclude any search results that contain http://ocvreinforcements.com
I’ve added http://ocvreinforcements.com/* to Excluded URLs For Crawling and Excluded URLs For Indexing, but they still show up in search results.
Is there a way to keep http://ocvreinforcements.com from showing up in search results?
discuss
September 11, 2019, 9:30pm
2
(from github.com/marevol )
http://ocvreinforcements.com/.*
discuss
September 13, 2019, 11:55am
3
(from github.com/markcamos )
@marevol - I have these in Excluded URLs for Crawling and Excluded URLs for Indexing:
http://ocvreinforcements.com/.*
http://www.ocvreinforcements.com/.*
https://ocvreinforcements.com/.*
https://www.ocvreinforcements.com/.*
… and they still show up in the results. Do I need to add them somewhere else? Does fess need to be restarted after making these changes?
discuss
September 13, 2019, 11:57am
4
discuss
September 13, 2019, 2:26pm
5
(from github.com/marevol )
Did you check fess-crawler.log?
You can check if they are crawled.
discuss
September 13, 2019, 7:01pm
6
(from github.com/markcamos )
The “Excludes” are the only thing that is in the fess-crawler.log. Is there a way to exclude these from the results, or not?
discuss
September 15, 2019, 6:17am
7
(from github.com/marevol )
“Excluded URLs For Crawling” means crawler ignores URLs, so they do not show up in a search result.
Did you remove all docs in fess index before crawling?
Indexed docs are not removed until ttl is reached.
discuss
September 15, 2019, 10:13pm
8
(from github.com/markcamos )
Thank you for this explanation. I assumed that putting them in the exclude list would ALSO remove them from the index.
discuss
September 15, 2019, 10:13pm
9
(from github.com/markcamos )
What is “ttl” in this context?
discuss
September 16, 2019, 12:08am
10