SAML implementation

(from github.com/goldenivan)
Hi everyone,

Have you planned to implement the SAML connection ?

(from github.com/marevol)
#1633

“Please advise on access for SAML protected sites from search engine crawl process”

I have many sites (129+) behind SAML authentication and so far I can only perhaps ‘ask’ webmasters to enable NTML or just make a copy (excludes CMS) of the site and lock it down to ip’s and leave it open to unauthenticated users but that seems like a lot to ask with more and more sites that seem to forget about the issues that using JavaScript navigation causes to things like search engines and non-browsers (wget/curl) access.

Hello all, since the process for SAML is more of a set of guidelines for mechanisms rather than a true “standard” (in my opinion), creating a universal SAML authenticator would be a difficult or even impossible task. However, you can use a web browser’s developer tools to see the combination of requests, request types, headers, cookies, and base64 encoded XML request / response pairs required to authenticate against a given SAML implementation (assuming you have credentials to perform a full authentication and log the process).

My question is this: Once the sequence of steps is understood, how would one add the logic to the FESS crawler to perform them before a given crawl? I have a recipe using cURL that is able to hit the right pages in the right order with the right headers and cookies, parse out the XML requests and submit them to the IDP, then finally access a protected site. How would one convert this into the necessary Java steps at “the right place” in FESS?

Please see fess-crawler.