Connection to Drupal

(from github.com/goldenivan)
Hi everyone,

I am trying to crawl a Drupal website. For this one, I need an authentication. I tried the method in the documentation here : https://fess.codelibs.org/12.1/admin/webconfig-guide.html

I used this configuration, but it is not working :

Web Crawling Configuration

ID = g55y9mEB4EJ67hIPqNja
URLs = http://example.com/login
Included URLs For Crawling = http://example.com/.*
Config Parameters = client.robotsTxtEnabled=false
User Agent = Mozilla/5.0 (compatible; Fess/12.1; +http://fess.codelibs.org/bot.html)
The number of Thread = 1
Interval time = 10000 ms
Boost = 1.0
Permissions = {role}guest

Crawler in the Scheduler

Target = all
Schedule = 0 0 * * *
Executor = groovy
Script = return container.getComponent(“crawlJob”).logLevel(“info”).sessionId(“g55y9mEB4EJ67hIPqNja”).webConfigIds([“g55y9mEB4EJ67hIPqNja”] as String[]).fileConfigIds([] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute();

Web Authentication

Hostname = http://example.com
Scheme = Form
Username = user
Password = pwd
Parameters =

encoding=UTF-8
token_method=GET
token_url=http://example.com/login
token_pattern=name=”authenticity_token” +value=”([^”]+)”
token_name=authenticity_token
login_method=POST
login_url=http://example.com/login
login_parameters=name=${username}&pass=${password}
form_build_id=“MyToken”
form_id=“connect”
op=“Connection”

Fess fail to connect on the website. Have you any idea about what is not working ?

Have you any idea what is happening ?

(from github.com/marevol)
Is there demo site for Drupal?
I’d like to reproduce it.

(from github.com/goldenivan)
Hi Marevol,

I found this to try Drupal : https://pantheon.io/register?utm_medium=Online%20Advertising&utm_source=Drupalorg&utm_content=Try%20Drupal&utm_ad_group_name=Text&utm_campaign=2018%20Try%20Drupal

(from github.com/goldenivan)
Do you have any description of parameters’s value allowed for the web authentication ?
For information, the body require a form-data for the post method. This is not the x-www-form-urlencoded.

(from github.com/goldenivan)
I’ve finally found the problem. To connect with Fess to a Drupal website, you need those 4 parameters :

  • name
  • pass
  • form_build_id
  • form_id

You need to use them like you suggested in your example for Redmine. You should concatenate without quotation marks :

login_parameters=name=${username}&pass=${password}&form_build_id=myform&formid=myformid

When you do that, you can crawl the website.