Unable to crawl secured Website

(from github.com/shwetazilpe)
Hi,

I’m trying to crawl Redmine website as given in the documentation I have created an account in www.redmine.org and my cofiguration is -

Web Crawling Configuration :
Name = Redmine
URL = https://www.redmine.org/my/page

Web Authentication :
Hostname = www.redmine.org
Port =
Realm =
Scheme = Form
Username= my-username
Password= ******
Parameters= encoding=UTF-8
token_method=GET
token_url=https://www.redmine.org/login
token_pattern=name=”authenticity_token” +value=”([^”]+)”
token_name=authenticity_token
login_method=POST
login_url=https://www.redmine.org/login
login_parameters=username=${username}&password=${password}
Web Config= Redmine

But I’m getting following error in fess_crawler log:
`2019-03-19 17:46:54,918 [WebFsCrawler] INFO org.codelibs.fess.helper.WebFsIndexHelper - Target URL: https://www.redmine.org/my/page
2019-03-19 17:46:54,918 [WebFsCrawler] DEBUG org.codelibs.fess.helper.WebFsIndexHelper - Crawling https://www.redmine.org/my/page
2019-03-19 17:46:54,932 [IndexUpdater] DEBUG org.codelibs.fess.indexer.IndexUpdater - Starting indexUpdater.
2019-03-19 17:46:54,970 [Crawler-20190319174625-1-1] DEBUG org.codelibs.fess.crawler.service.impl.EsUrlQueueService - Queued URL: [UrlQueueImpl [id=20190319174625-1.aHR0cHM6Ly93d3cucmVkbWluZS5vcmcvbXkvcGFnZQ, sessionId=20190319174625-1, method=GET, url=https://www.redmine.org/my/page, encoding=null, parentUrl=null, depth=0, lastModified=0, createTime=1552997814730]]
2019-03-19 17:46:55,232 [Crawler-20190319174625-1-1] INFO org.codelibs.fess.crawler.helper.impl.LogHelperImpl - Crawling URL: https://www.redmine.org/my/page
2019-03-19 17:46:55,234 [Crawler-20190319174625-1-1] DEBUG org.codelibs.fess.crawler.FessCrawlerThread - Searching indexed document: https:%2F%2Fwww.redmine.org%2Fmy%2Fpage;role=Rguest
2019-03-19 17:46:55,238 [Crawler-20190319174625-1-1] DEBUG org.codelibs.fess.es.client.FessEsClient - Query DSL:
{“timeout”:“10000ms”,“query”:{“ids”:{“type”:[“doc”],“values”:[“https:%2F%2Fwww.redmine.org%2Fmy%2Fpage;role=Rguest”],“boost”:1.0}},“version”:true,"_source":{“includes”:["_id",“last_modified”,“anchor”,“segment”,“expires”,“click_count”,“favorite_count”],“excludes”:[]}}
2019-03-19 17:46:55,262 [Crawler-20190319174625-1-1] DEBUG org.codelibs.fess.es.client.FessEsClient - Query DSL:
{“size”:0,“timeout”:“10000ms”,“query”:{“term”:{“parent_id”:{“value”:“https:%2F%2Fwww.redmine.org%2Fmy%2Fpage;role=Rguest”,“boost”:1.0}}},"_source":{“includes”:[“url”],“excludes”:[]}}
2019-03-19 17:46:55,266 [Crawler-20190319174625-1-1] DEBUG org.codelibs.fess.crawler.helper.impl.LogHelperImpl - Getting the content from URL: https://www.redmine.org/my/page
2019-03-19 17:46:55,279 [Crawler-20190319174625-1-1] DEBUG org.codelibs.fess.crawler.client.http.HcHttpClient - Initializing org.codelibs.fess.crawler.client.http.HcHttpClient
2019-03-19 17:46:56,668 [Crawler-20190319174625-1-1] DEBUG org.codelibs.fess.crawler.client.http.form.FormScheme - Token is not found.

`

.
.
.
`2019-03-19 17:46:56,832 [Crawler-20190319174625-1-1] WARN org.codelibs.fess.crawler.client.http.form.FormScheme - Failed to login on https://www.redmine.org/login. The http status is 422.

`

where I’m going wrong. Did I missed any other configuration?
Thanks in advance!

(from github.com/berti92)
I got the same problem. The issue is that the authenticity_token will be not submitted to the login form in redmine. Why? I don’t know. A fix? I don’t know. Hopefully the maintainer could help us!

(from github.com/marevol)
I checked it and it works.
Accessing www.redmine.org, the configuration of web auth is as below:

  • Hostname: www.redmine.org
  • Port: 443
  • Realm:
  • Scheme: Form
  • Username: [your username]
  • Password: [your password]
  • Parameters:
encoding=UTF-8
token_method=GET
token_url=https://www.redmine.org/login
token_pattern=name="authenticity_token"[^>]+value="([^"]+)"
token_name=authenticity_token
login_method=POST
login_url=https://www.redmine.org/login
login_parameters=username=${username}&password=${password}

(from github.com/berti92)
@marevol Thank you, now it works. You could close this issue.