(from github.com/clauded)
Fess is indexing other sites so it’s probably not a low level network problem. Strangely, it went a bit further last night as it got a 200 response from the server. Here’s a good and a bad attempt:
-------------------- GOOD ATTEMPT ------------------------------------------------------------------------
2017-06-01 00:00:32,757 [Crawler-20170601000000-1-5] DEBUG Queued URL: [UrlQueueImpl [id=20170601000000-1.aHR0cDovL3d3dy5yZXRyYWl0ZXF1ZWJlYy5nb3V2LnFjLmNhL2ZyL1BhZ2VzL2FjY3VlaWwuYXNweA, sessionId=20170601000000-1, method=GET, url=http://www.retraitequebec.gouv.qc.ca/fr/Pages/accueil.aspx, encoding=null, parentUrl=http://www.retraitequebec.gouv.qc.ca/fr/, depth=3, lastModified=null, createTime=1496289632483]]
2017-06-01 00:00:32,760 [Crawler-20170601000000-1-1] INFO Crawling URL: http://www.retraitequebec.gouv.qc.ca/fr/Pages/accueil.aspx
2017-06-01 00:00:32,760 [Crawler-20170601000000-1-1] DEBUG Getting the content from URL: http://www.retraitequebec.gouv.qc.ca/fr/Pages/accueil.aspx
2017-06-01 00:00:32,760 [Crawler-20170601000000-1-1] DEBUG Accessing http://www.retraitequebec.gouv.qc.ca/fr/Pages/accueil.aspx
2017-06-01 00:00:32,761 [Crawler-20170601000000-1-1] DEBUG CookieSpec selected: default
2017-06-01 00:00:32,761 [Crawler-20170601000000-1-1] DEBUG Cookie [version: 0][name: Langue][value: Francais][domain: www.retraitequebec.gouv.qc.ca][path: /][expiry: Fri Dec 01 00:00:21 EST 2017] match [www.retraitequebec.gouv.qc.ca:80/fr/Pages/accueil.aspx]
2017-06-01 00:00:32,762 [Crawler-20170601000000-1-1] DEBUG Cookie [version: 0][name: RRQ_SupporteCookie][value: oui][domain: www.retraitequebec.gouv.qc.ca][path: /][expiry: null] match [www.retraitequebec.gouv.qc.ca:80/fr/Pages/accueil.aspx]
2017-06-01 00:00:32,762 [Crawler-20170601000000-1-1] DEBUG Cookie [version: 0][name: TS016ba385][value: 01298634beea246d40eb4f91e89adb2a5064a378dad4285622a4bebc3a88595561c85cbbf446e7435841593e1cdfe4dea27f87ca6629e72f85c70dc36c01079e144d06cb0d][domain: www.retraitequebec.gouv.qc.ca][path: /][expiry: null] match [www.retraitequebec.gouv.qc.ca:80/fr/Pages/accueil.aspx]
2017-06-01 00:00:32,762 [Crawler-20170601000000-1-1] DEBUG Connection request: [route: {}->http://www.retraitequebec.gouv.qc.ca:80][total kept alive: 1; route allocated: 1 of 20; total allocated: 1 of 200]
2017-06-01 00:00:32,763 [Crawler-20170601000000-1-1] DEBUG Connection leased: [id: 0][route: {}->http://www.retraitequebec.gouv.qc.ca:80][total kept alive: 0; route allocated: 1 of 20; total allocated: 1 of 200]
2017-06-01 00:00:32,763 [Crawler-20170601000000-1-1] DEBUG Executing request GET /fr/Pages/accueil.aspx HTTP/1.1
2017-06-01 00:00:32,764 [Crawler-20170601000000-1-1] DEBUG Target auth state: UNCHALLENGED
2017-06-01 00:00:32,764 [Crawler-20170601000000-1-1] DEBUG Proxy auth state: UNCHALLENGED
2017-06-01 00:00:32,764 [Crawler-20170601000000-1-1] DEBUG http-outgoing-0 >> GET /fr/Pages/accueil.aspx HTTP/1.1
2017-06-01 00:00:32,764 [Crawler-20170601000000-1-1] DEBUG http-outgoing-0 >> Host: www.retraitequebec.gouv.qc.ca
2017-06-01 00:00:32,764 [Crawler-20170601000000-1-1] DEBUG http-outgoing-0 >> Connection: Keep-Alive
2017-06-01 00:00:32,765 [Crawler-20170601000000-1-1] DEBUG http-outgoing-0 >> User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2017-06-01 00:00:32,765 [Crawler-20170601000000-1-1] DEBUG http-outgoing-0 >> Cookie: Langue=Francais; RRQ_SupporteCookie=oui; TS016ba385=01298634beea246d40eb4f91e89adb2a5064a378dad4285622a4bebc3a88595561c85cbbf446e7435841593e1cdfe4dea27f87ca6629e72f85c70dc36c01079e144d06cb0d
2017-06-01 00:00:32,765 [Crawler-20170601000000-1-1] DEBUG http-outgoing-0 >> Accept-Encoding: gzip,deflate
2017-06-01 00:00:32,765 [Crawler-20170601000000-1-1] DEBUG http-outgoing-0 >> "GET /fr/Pages/accueil.aspx HTTP/1.1[\r][\n]"
2017-06-01 00:00:32,765 [Crawler-20170601000000-1-1] DEBUG http-outgoing-0 >> "Host: www.retraitequebec.gouv.qc.ca[\r][\n]"
2017-06-01 00:00:32,765 [Crawler-20170601000000-1-1] DEBUG http-outgoing-0 >> "Connection: Keep-Alive[\r][\n]"
2017-06-01 00:00:32,765 [Crawler-20170601000000-1-1] DEBUG http-outgoing-0 >> "User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)[\r][\n]"
2017-06-01 00:00:32,766 [Crawler-20170601000000-1-1] DEBUG http-outgoing-0 >> "Cookie: Langue=Francais; RRQ_SupporteCookie=oui; TS016ba385=01298634beea246d40eb4f91e89adb2a5064a378dad4285622a4bebc3a88595561c85cbbf446e7435841593e1cdfe4dea27f87ca6629e72f85c70dc36c01079e144d06cb0d[\r][\n]"
2017-06-01 00:00:32,766 [Crawler-20170601000000-1-1] DEBUG http-outgoing-0 >> "Accept-Encoding: gzip,deflate[\r][\n]"
2017-06-01 00:00:32,766 [Crawler-20170601000000-1-1] DEBUG http-outgoing-0 >> "[\r][\n]"
2017-06-01 00:00:32,772 [Crawler-20170601000000-1-5] DEBUG The url is null. (0)
2017-06-01 00:00:32,805 [Crawler-20170601000000-1-1] DEBUG http-outgoing-0 << "HTTP/1.1 200 OK[\r][\n]"
-------------------- BAD ATTEMPT ------------------------------------------------------------------------
2017-06-01 09:57:55,672 [Crawler-20170601095706-1-1] DEBUG Queued URL: [UrlQueueImpl [id=20170601095706-1.aHR0cDovL3d3dy5yZXRyYWl0ZXF1ZWJlYy5nb3V2LnFjLmNhOjgwL2ZyL1BhZ2VzL2FjY3VlaWwuYXNweA, sessionId=20170601095706-1, method=GET, url=http://www.retraitequebec.gouv.qc.ca:80/fr/Pages/accueil.aspx, encoding=null, parentUrl=http://www.retraitequebec.gouv.qc.ca/fr/Pages/accueil.aspx, depth=4, lastModified=null, createTime=1496325465648]]
2017-06-01 09:57:55,690 [Crawler-20170601095706-1-1] INFO Crawling URL: http://www.retraitequebec.gouv.qc.ca:80/fr/Pages/accueil.aspx
2017-06-01 09:57:55,691 [Crawler-20170601095706-1-1] DEBUG Getting the content from URL: http://www.retraitequebec.gouv.qc.ca:80/fr/Pages/accueil.aspx
2017-06-01 09:57:55,691 [Crawler-20170601095706-1-1] DEBUG Accessing http://www.retraitequebec.gouv.qc.ca:80/fr/Pages/accueil.aspx
2017-06-01 09:57:55,691 [Crawler-20170601095706-1-1] DEBUG CookieSpec selected: default
2017-06-01 09:57:55,691 [Crawler-20170601095706-1-1] DEBUG Cookie [version: 0][name: Langue][value: Francais][domain: www.retraitequebec.gouv.qc.ca][path: /][expiry: Fri Dec 01 09:57:24 EST 2017] match [www.retraitequebec.gouv.qc.ca:80/fr/Pages/accueil.aspx]
2017-06-01 09:57:55,691 [Crawler-20170601095706-1-1] DEBUG Cookie [version: 0][name: RRQ_SupporteCookie][value: oui][domain: www.retraitequebec.gouv.qc.ca][path: /][expiry: null] match [www.retraitequebec.gouv.qc.ca:80/fr/Pages/accueil.aspx]
2017-06-01 09:57:55,692 [Crawler-20170601095706-1-1] DEBUG Cookie [version: 0][name: TS016ba385][value: 01298634be077a5e19d3acca129644a15fa316175b5b3761dbea3b9808ea559033b5fda8d44ba40f8318023cdaaf16b0c193626c06d4e294ff38fc17e0f20cf47346ab3394][domain: www.retraitequebec.gouv.qc.ca][path: /][expiry: null] match [www.retraitequebec.gouv.qc.ca:80/fr/Pages/accueil.aspx]
2017-06-01 09:57:55,692 [Crawler-20170601095706-1-1] DEBUG Connection request: [route: {}->http://www.retraitequebec.gouv.qc.ca:80][total kept alive: 1; route allocated: 1 of 20; total allocated: 1 of 200]
2017-06-01 09:57:55,693 [Crawler-20170601095706-1-1] DEBUG http-outgoing-0 << "[read] I/O error: Read timed out"
2017-06-01 09:57:55,693 [Crawler-20170601095706-1-1] DEBUG Connection leased: [id: 0][route: {}->http://www.retraitequebec.gouv.qc.ca:80][total kept alive: 0; route allocated: 1 of 20; total allocated: 1 of 200]
2017-06-01 09:57:55,693 [Crawler-20170601095706-1-1] DEBUG Executing request GET /fr/Pages/accueil.aspx HTTP/1.1
2017-06-01 09:57:55,693 [Crawler-20170601095706-1-1] DEBUG Target auth state: UNCHALLENGED
2017-06-01 09:57:55,693 [Crawler-20170601095706-1-1] DEBUG Proxy auth state: UNCHALLENGED
2017-06-01 09:57:55,693 [Crawler-20170601095706-1-1] DEBUG http-outgoing-0 >> GET /fr/Pages/accueil.aspx HTTP/1.1
2017-06-01 09:57:55,693 [Crawler-20170601095706-1-1] DEBUG http-outgoing-0 >> Host: www.retraitequebec.gouv.qc.ca:80
2017-06-01 09:57:55,694 [Crawler-20170601095706-1-1] DEBUG http-outgoing-0 >> Connection: Keep-Alive
2017-06-01 09:57:55,694 [Crawler-20170601095706-1-1] DEBUG http-outgoing-0 >> User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
2017-06-01 09:57:55,694 [Crawler-20170601095706-1-1] DEBUG http-outgoing-0 >> Cookie: Langue=Francais; RRQ_SupporteCookie=oui; TS016ba385=01298634be077a5e19d3acca129644a15fa316175b5b3761dbea3b9808ea559033b5fda8d44ba40f8318023cdaaf16b0c193626c06d4e294ff38fc17e0f20cf47346ab3394
2017-06-01 09:57:55,694 [Crawler-20170601095706-1-1] DEBUG http-outgoing-0 >> Accept-Encoding: gzip,deflate
2017-06-01 09:57:55,694 [Crawler-20170601095706-1-1] DEBUG http-outgoing-0 >> "GET /fr/Pages/accueil.aspx HTTP/1.1[\r][\n]"
2017-06-01 09:57:55,694 [Crawler-20170601095706-1-1] DEBUG http-outgoing-0 >> "Host: www.retraitequebec.gouv.qc.ca:80[\r][\n]"
2017-06-01 09:57:55,694 [Crawler-20170601095706-1-1] DEBUG http-outgoing-0 >> "Connection: Keep-Alive[\r][\n]"
2017-06-01 09:57:55,694 [Crawler-20170601095706-1-1] DEBUG http-outgoing-0 >> "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36[\r][\n]"
2017-06-01 09:57:55,694 [Crawler-20170601095706-1-1] DEBUG http-outgoing-0 >> "Cookie: Langue=Francais; RRQ_SupporteCookie=oui; TS016ba385=01298634be077a5e19d3acca129644a15fa316175b5b3761dbea3b9808ea559033b5fda8d44ba40f8318023cdaaf16b0c193626c06d4e294ff38fc17e0f20cf47346ab3394[\r][\n]"
2017-06-01 09:57:55,694 [Crawler-20170601095706-1-1] DEBUG http-outgoing-0 >> "Accept-Encoding: gzip,deflate[\r][\n]"
2017-06-01 09:57:55,694 [Crawler-20170601095706-1-1] DEBUG http-outgoing-0 >> "[\r][\n]"
2017-06-01 09:57:59,427 [CoreLib-TimeoutManager] DEBUG Closing expired connections
2017-06-01 09:57:59,428 [CoreLib-TimeoutManager] DEBUG Closing connections idle longer than 60000 MILLISECONDS
2017-06-01 09:58:04,319 [IndexUpdater] DEBUG Processing documents in IndexUpdater queue.
2017-06-01 09:58:04,320 [IndexUpdater] DEBUG Getting documents in IndexUpdater queue.
2017-06-01 09:58:04,324 [IndexUpdater] INFO Processing no docs (Doc:{access 4ms}, Mem:{used 148MB, heap 193MB, max 494MB})
2017-06-01 09:58:04,325 [IndexUpdater] DEBUG Processed documents in IndexUpdater queue.
2017-06-01 09:58:04,429 [CoreLib-TimeoutManager] DEBUG Closing expired connections
I wonder how the port (:80) gets added to the host? Done by Fess or instructed by the server? Could this cause the timeout error?