Crawl confluence with basic authentication

Hi, thank you for all your great work and support.

I am having issue trying to crawl confluence with basic authentication (using confluence local user), it works when i use curl, but it redirects to sso saml url when trying to crawl it using FESS. We do have sso setup on confluence for most human users, but i tested this local user works using curl and login directly on the login page.

this works:

# curl -v -u confluencelocaluser
Server auth using Basic with user 'confluencelocaluser'
> GET /display/TEST HTTP/1.1
> Authorization: Basic <somehashstrings-redacted>
> User-Agent: curl/7.29.0
> Host: confluence-site-com  #note: have to change . to - (new user can only post 2 links)
< HTTP/1.1 200
< Date: Thu, 10 Sep 2020 18:22:02 GMT
< X-Confluence-Request-Time: 1599762122376
< X-Seraph-LoginReason: OK
< X-AUSERNAME: confluencelocaluser
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< X-Frame-Options: SAMEORIGIN
< Content-Security-Policy: frame-ancestors 'self'
< X-Accel-Buffering: no
< Content-Type: text/html;charset=UTF-8
< Set-Cookie: JSESSIONID=0ACAD2EDC3B37D97D4FF4388C3FAB6AD; Path=/; Secure; HttpOnly
< Transfer-Encoding: chunked

this doesn’t work:

FESS Web Authentication settings:
Port: 443
Scheme: Basic
Username: confluencelocaluser
Password: ******
Web Config: Confluence

I also read the webconfig-guide.html for Redmine and XWiki, but I can’t get it working on confluence… can you please help providing guide for confluence. Thank you in advance.

Please check fess-crawler.log with debug logging.

thank you, i checked the fesss-crawler.log with debug enabled, and i can see that the log showed 404 first from trying to get robots.txt which doesn’t exist, then it showed 302 redirect to SAML SSO to okta… how can i configure fess to not follow the redirect?

when i tried it using curl with basic authentication, it didn’t follow the redirect. please let me know if we can configure fess to behave like curl. thank you.

Hi,I got the same problem like you, got any solution?

It might be better to check request headers in the debug log.

it is like this, the mosaic is all same host address

It seems to be FORM authentication, not BASIC authentication.