Include rules taking precedence over Exclude rules.

Steps to reproduce:

  1. Setup a Web Crawling Configuration with the following rules
  • URLs
  • Included URLs For Crawling*
  • Excluded URLs For Crawling
  • Included URLs For Indexing:*
  • Excluded URLs For Indexing:**
  1. Start the Crawl

  2. After results are generated search for api.

  3. Results include content from

Expected outcome:

  1. The “Included URLs For Indexing” rule should limit the indexed results to content hosted on

  2. The “Excluded URLs For Indexing” rule should prevent the /apidocs and /dev content from being indexed.

Actual outcome:

The outcome of 1 is true and results are limited to the expected host domain however the rule appears to invalidate point 2; with results for /apidocs and /dev getting included in the index.

Please advise if this is a bug or an issue with the Regular Expression rules in use?

The processing order is “Excluded URLs For Indexing” -> “Included URLs For Indexing”.
So, try to set “Included URLs For Indexing” to empty.

@marevol Thanks for confirming that. I have used that method and confirm the exclude rules are processed as expected after removing the include rule.