Failure URL needs to show referring page

(from github.com/pcolmer)
We’re getting some errors in the Failure URL log which we could fix if we could find the errant page.

For example, we’ve got some bad URLs resulting in a java.lang.IllegalArgumentException from org.codelibs.fess.crawler.exception.CrawlingAccessException.

If the details page showed which page had been crawled when it encountered that URL, that would help us fix the problem.

(from github.com/marevol)
Check fess-crawler.log.

(from github.com/pcolmer)
All I can find is this:

2017-05-17 03:25:07,866 [Crawler-20170517000000-9-1] INFO  Crawling URL: http://www.96boards.org/register/ http://www.96boards.org/register/
2017-05-17 03:25:10,324 [Crawler-20170517000000-11-2] INFO  Crawling URL: http://connect.linaro.org/tag/narayana-prasad-athreya/
2017-05-17 03:25:10,395 [Crawler-20170517000000-9-1] INFO  Failed to access to http://www.96boards.org/register/ http://www.96boards.org/register/; The url may not be valid: http://www.96boards.org/register/ http://www.96boards.org/register/; The url may not be valid: http://www.96boards.org/register/ http://www.96boards.org/register/; The url may not be valid: http://www.96boards.org/register/ http://www.96boards.org/register/; The url may not be valid: http://www.96boards.org/register/ http://www.96boards.org/register/; The url may not be valid: http://www.96boards.org/register/ http://www.96boards.org/register/

But that isn’t telling me where it got that original bad URL from …

(from github.com/marevol)
To check child links

  1. Change to query.additional.search.fields=anchor in fess_config.properties
  2. Restart Fess
  3. Search by anchor:“http://www.96boards.org/register/ http://www.96boards.org/register/” or anchor:“http://www.96boards.org/register/ http://www.96boards.org/register/*

(from github.com/pcolmer)
Thank you for that.

Reading https://github.com/codelibs/fess/issues/1074, does this mean I need to wait for version 11.2.0? I’m presuming that issue 1074 adds support for “anchor” as a search property?

(from marevol (Shinsuke Sugaya) · GitHub)
In Fess 11.2,

Change to query.additional.search.fields=anchor in fess_config.properties

is not needed(broken-links reporting will be added in Fess 11.2).
So, I think that it works in current releases if modifying query.additional.search.fields.