(from github.com/abolotnov)
I have been playing with included/excluded URL crawler settings to contain the crawler to only collect documents within a second-level domain and its subdomains (and failed) but a larger problem occured: Once it’s done crawling and indexing and generating suggested words, I have deleted all the documents from the index and restarted crawling. Now, no documents are captured (crawler log below). Am I missing some steps to have it recrawl things? Also, is there a way to limit crawling to only like *.sample.com subdomains?
2019-01-08 01:30:49,974 [main] INFO ...Loading specified properties and get by the key: fess_env_crawler.properties, lasta_di.smart.deploy.mode
2019-01-08 01:30:49,977 [main] INFO ...Setting smart deploy mode: warm
2019-01-08 01:30:49,994 [main] INFO ...Reading lasta_di.xml
2019-01-08 01:30:50,021 [main] INFO ...Reading redefiner.xml
2019-01-08 01:30:50,081 [main] INFO ...Reading smartdeploy.xml
2019-01-08 01:30:50,101 [main] INFO ...Reading smart/warmdeploy.xml
2019-01-08 01:30:50,104 [main] INFO ...Reading convention.xml
2019-01-08 01:30:50,107 [main] INFO ...Reading embedded_convention.xml
2019-01-08 01:30:50,113 [main] INFO ...Reading creator.xml
2019-01-08 01:30:50,116 [main] INFO ...Reading convention.xml (recycle)
2019-01-08 01:30:50,116 [main] INFO ...Reading customizer.xml
2019-01-08 01:30:50,118 [main] INFO ...Reading lastafw_customizer.xml
2019-01-08 01:30:50,127 [main] INFO ...Reading embedded_customizer.xml
2019-01-08 01:30:50,129 [main] INFO ...Reading tx_customizer.xml
2019-01-08 01:30:50,137 [main] INFO ...Reading my_creator.xml
2019-01-08 01:30:50,140 [main] INFO ...Reading lastafw_creator.xml
2019-01-08 01:30:50,142 [main] INFO ...Reading convention.xml (recycle)
2019-01-08 01:30:50,142 [main] INFO ...Reading customizer.xml (recycle)
2019-01-08 01:30:50,150 [main] INFO ...Reading embedded_creator.xml
2019-01-08 01:30:50,152 [main] INFO ...Reading convention.xml (recycle)
2019-01-08 01:30:50,152 [main] INFO ...Reading customizer.xml (recycle)
2019-01-08 01:30:50,156 [main] INFO ...Reading customizer.xml (recycle)
2019-01-08 01:30:50,179 [main] INFO ...Reading app.xml
2019-01-08 01:30:50,183 [main] INFO ...Reading convention.xml
2019-01-08 01:30:50,186 [main] INFO ...Reading embedded_convention.xml
2019-01-08 01:30:50,189 [main] INFO ...Reading lastaflute_core.xml
2019-01-08 01:30:50,191 [main] INFO ...Reading lastaflute_assist.xml
2019-01-08 01:30:50,194 [main] INFO ...Reading lastaflute_director.xml
2019-01-08 01:30:50,268 [main] INFO ...Reading fess.xml
2019-01-08 01:30:50,271 [main] INFO ...Reading fess_config.xml
2019-01-08 01:30:50,276 [main] INFO ...Reading fess_ds.xml
2019-01-08 01:30:50,282 [main] INFO ...Reading jar:file:/usr/share/fess/app/WEB-INF/lib/fess-ds-atlassian-12.2.0.jar!/fess_ds++.xml
2019-01-08 01:30:50,292 [main] INFO ...Reading jar:file:/usr/share/fess/app/WEB-INF/lib/fess-ds-csv-12.2.0.jar!/fess_ds++.xml
2019-01-08 01:30:50,297 [main] INFO ...Reading jar:file:/usr/share/fess/app/WEB-INF/lib/fess-ds-db-12.2.0.jar!/fess_ds++.xml
2019-01-08 01:30:50,300 [main] INFO ...Reading jar:file:/usr/share/fess/app/WEB-INF/lib/fess-ds-elasticsearch-12.2.0.jar!/fess_ds++.xml
2019-01-08 01:30:50,306 [main] INFO ...Reading jar:file:/usr/share/fess/app/WEB-INF/lib/fess-ds-gitbucket-12.2.0.jar!/fess_ds++.xml
2019-01-08 01:30:50,311 [main] INFO ...Reading jar:file:/usr/share/fess/app/WEB-INF/lib/fess-ds-gsuite-12.2.0.jar!/fess_ds++.xml
2019-01-08 01:30:50,316 [main] INFO ...Reading jar:file:/usr/share/fess/app/WEB-INF/lib/fess-ds-json-12.2.0.jar!/fess_ds++.xml
2019-01-08 01:30:50,320 [main] INFO ...Reading jar:file:/usr/share/fess/app/WEB-INF/lib/fess-ds-slack-12.2.0.jar!/fess_ds++.xml
2019-01-08 01:30:50,324 [main] INFO ...Reading esflute_config.xml
2019-01-08 01:30:50,326 [main] INFO ...Reading esclient.xml
2019-01-08 01:30:50,534 [main] INFO ...Reading esflute_user.xml
2019-01-08 01:30:50,537 [main] INFO ...Reading esclient.xml (recycle)
2019-01-08 01:30:50,553 [main] INFO ...Reading esflute_log.xml
2019-01-08 01:30:50,556 [main] INFO ...Reading esclient.xml (recycle)
2019-01-08 01:30:50,617 [main] INFO ...Reading crawler_es.xml
2019-01-08 01:30:50,619 [main] INFO ...Reading crawler/container.xml
2019-01-08 01:30:50,622 [main] INFO ...Reading crawler/client.xml
2019-01-08 01:30:50,624 [main] INFO ...Reading crawler/container.xml (recycle)
2019-01-08 01:30:50,624 [main] INFO ...Reading crawler/robotstxt.xml
2019-01-08 01:30:50,626 [main] INFO ...Reading crawler/container.xml (recycle)
2019-01-08 01:30:50,629 [main] INFO ...Reading crawler/contentlength.xml
2019-01-08 01:30:50,631 [main] INFO ...Reading crawler/container.xml (recycle)
2019-01-08 01:30:50,632 [main] INFO ...Reading crawler/mimetype.xml
2019-01-08 01:30:50,634 [main] INFO ...Reading crawler/container.xml (recycle)
2019-01-08 01:30:50,677 [main] INFO ...Reading crawler/rule.xml
2019-01-08 01:30:50,679 [main] INFO ...Reading crawler/container.xml (recycle)
2019-01-08 01:30:50,679 [main] INFO ...Reading crawler/transformer.xml
2019-01-08 01:30:50,681 [main] INFO ...Reading crawler/transformer_basic.xml
2019-01-08 01:30:50,682 [main] INFO ...Reading crawler/container.xml (recycle)
2019-01-08 01:30:50,705 [main] INFO ...Reading crawler/filter.xml
2019-01-08 01:30:50,707 [main] INFO ...Reading crawler/container.xml (recycle)
2019-01-08 01:30:50,708 [main] INFO ...Reading crawler/interval.xml
2019-01-08 01:30:50,710 [main] INFO ...Reading crawler/container.xml (recycle)
2019-01-08 01:30:50,712 [main] INFO ...Reading crawler/extractor.xml
2019-01-08 01:30:50,714 [main] INFO ...Reading crawler/container.xml (recycle)
2019-01-08 01:30:50,725 [main] INFO ...Reading crawler/extractor+tikaExtractor.xml
2019-01-08 01:30:50,726 [main] INFO ...Reading crawler/container.xml
2019-01-08 01:30:50,746 [main] INFO ...Reading crawler/mimetype.xml (recycle)
2019-01-08 01:30:50,746 [main] INFO ...Reading crawler/encoding.xml
2019-01-08 01:30:50,748 [main] INFO ...Reading crawler/container.xml (recycle)
2019-01-08 01:30:50,749 [main] INFO ...Reading crawler/urlconverter.xml
2019-01-08 01:30:50,751 [main] INFO ...Reading crawler/container.xml (recycle)
2019-01-08 01:30:50,762 [main] INFO ...Reading crawler/log.xml
2019-01-08 01:30:50,768 [main] INFO ...Reading crawler/container.xml (recycle)
2019-01-08 01:30:50,771 [main] INFO ...Reading crawler/sitemaps.xml
2019-01-08 01:30:50,773 [main] INFO ...Reading crawler/container.xml (recycle)
2019-01-08 01:30:50,776 [main] INFO ...Reading crawler/es.xml
2019-01-08 01:30:50,787 [main] INFO ...Reading crawler_es+crawlerThread.xml
2019-01-08 01:30:50,803 [main] INFO ...Reading crawler_es+crawlerConfig.xml
2019-01-08 01:30:50,806 [main] INFO ...Reading fess_thumbnail.xml
2019-01-08 01:30:50,901 [main] INFO [Objective Config]
2019-01-08 01:30:50,901 [main] INFO fess_config.properties extends [fess_env_crawler.properties]
2019-01-08 01:30:50,901 [main] INFO checkImplicitOverride=true, propertyCount=430
2019-01-08 01:30:50,934 [main] INFO [Exception Translator]
2019-01-08 01:30:50,934 [main] INFO exceptionTranslationProvider: null
2019-01-08 01:30:50,934 [main] INFO [Async Manager]
2019-01-08 01:30:50,935 [main] INFO defaultConcurrentAsyncOption: {no option}
2019-01-08 01:30:50,935 [main] INFO primaryExecutorService: ThreadPoolExecutor@29006752
2019-01-08 01:30:50,935 [main] INFO secondaryExecutorService: ThreadPoolExecutor@470a9030
2019-01-08 01:30:50,935 [main] INFO tertiaryExecutorService: ThreadPoolExecutor@66d57c1b
2019-01-08 01:30:50,935 [main] INFO [Primary Cipher]
2019-01-08 01:30:50,935 [main] INFO invertibleCryptographer: {AES, UTF-8}
2019-01-08 01:30:50,935 [main] INFO oneWayCryptographer: {SHA-256, UTF-8}
2019-01-08 01:30:50,936 [main] INFO [Time Manager]
2019-01-08 01:30:50,936 [main] INFO businessTimeHandler: TypicalBusinessTimeHandler:{TypicalTimeResourceProvider$$Lambda$42/1047515321, TypicalTimeResourceProvider$$Lambda$43/906448455}
2019-01-08 01:30:50,936 [main] INFO currentIgnoreTransaction: false
2019-01-08 01:30:50,936 [main] INFO adjustTimeMillis: 0
2019-01-08 01:30:50,936 [main] INFO adjustAbsoluteMode: false
2019-01-08 01:30:50,976 [main] INFO [JSON Manager]
2019-01-08 01:30:50,976 [main] INFO realJsonParser: GsonJsonEngine
2019-01-08 01:30:50,976 [main] INFO adjustment: nullsSuppressed
2019-01-08 01:30:50,978 [main] INFO option: {CAMEL_TO_LOWER_SNAKE}
2019-01-08 01:30:50,979 [main] INFO [Postbox]
2019-01-08 01:30:50,979 [main] INFO postOffice: PostOffice@69eb86b4
2019-01-08 01:30:50,979 [main] INFO postalParkingLot: SMailPostalParkingLot:{{category:{main}=motorbike:{session=javax.mail.Session@585ac855}}}@5bb8f9e2
2019-01-08 01:30:50,979 [main] INFO postalPersonnel: FessMailDeliveryDepartmentCreator$1:{SMailConventionReceptionist:{[]}@6a933be2, batch:{[proofreader:{pmcomment}]}}@5f78de22
2019-01-08 01:30:51,157 [main] INFO no modules loaded
2019-01-08 01:30:51,158 [main] INFO loaded plugin [org.elasticsearch.index.reindex.ReindexPlugin]
2019-01-08 01:30:51,158 [main] INFO loaded plugin [org.elasticsearch.join.ParentJoinPlugin]
2019-01-08 01:30:51,158 [main] INFO loaded plugin [org.elasticsearch.percolator.PercolatorPlugin]
2019-01-08 01:30:51,158 [main] INFO loaded plugin [org.elasticsearch.script.mustache.MustachePlugin]
2019-01-08 01:30:51,158 [main] INFO loaded plugin [org.elasticsearch.transport.Netty4Plugin]
2019-01-08 01:30:52,960 [main] INFO Lasta Di boot successfully.
2019-01-08 01:30:52,962 [main] INFO SmartDeploy Mode: Warm Deploy
2019-01-08 01:30:52,962 [main] INFO Smart Package: org.codelibs.fess.app
2019-01-08 01:30:53,140 [main] INFO Starting Crawler..
2019-01-08 01:30:53,164 [DataStoreCrawler] INFO No crawling target urls.
2019-01-08 01:30:53,184 [WebFsCrawler] INFO no modules loaded
2019-01-08 01:30:53,184 [WebFsCrawler] INFO loaded plugin [org.elasticsearch.index.reindex.ReindexPlugin]
2019-01-08 01:30:53,184 [WebFsCrawler] INFO loaded plugin [org.elasticsearch.join.ParentJoinPlugin]
2019-01-08 01:30:53,184 [WebFsCrawler] INFO loaded plugin [org.elasticsearch.percolator.PercolatorPlugin]
2019-01-08 01:30:53,184 [WebFsCrawler] INFO loaded plugin [org.elasticsearch.script.mustache.MustachePlugin]
2019-01-08 01:30:53,184 [WebFsCrawler] INFO loaded plugin [org.elasticsearch.transport.Netty4Plugin]
2019-01-08 01:30:53,211 [WebFsCrawler] INFO Connected to localhost:9300
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Target URL: http://www.luxoft.com
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Included URL: http://www.luxoft.com/.*
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Included URL: https://www.luxoft.com/.*
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Excluded URL: https://career.luxoft.com/careers/.*
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Excluded URL: http://career.luxoft.com/carrers/.*
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Excluded URL: .*\.gif
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Excluded URL: .*\.jpg
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Excluded URL: .*\.jpeg
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Excluded URL: .*\.jpe
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Excluded URL: .*\.pcx
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Excluded URL: .*\.png
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Excluded URL: .*\.tiff
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Excluded URL: .*\.bmp
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Excluded URL: .*\.ics
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Excluded URL: .*\.msg
2019-01-08 01:30:53,321 [WebFsCrawler] INFO Excluded URL: .*\.css
2019-01-08 01:30:53,322 [WebFsCrawler] INFO Excluded URL: .*\.js
2019-01-08 01:31:03,335 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms}, Mem:{used 163MB, heap 245MB, max 494MB})
2019-01-08 01:31:13,331 [IndexUpdater] INFO Processing no docs (Doc:{access 1ms}, Mem:{used 164MB, heap 245MB, max 494MB})
2019-01-08 01:31:23,349 [IndexUpdater] INFO Processing no docs (Doc:{access 18ms}, Mem:{used 166MB, heap 245MB, max 494MB})
2019-01-08 01:31:33,333 [IndexUpdater] INFO Processing no docs (Doc:{access 1ms}, Mem:{used 167MB, heap 245MB, max 494MB})
2019-01-08 01:31:43,334 [IndexUpdater] INFO Processing no docs (Doc:{access 1ms}, Mem:{used 168MB, heap 245MB, max 494MB})
2019-01-08 01:31:53,073 [CoreLib-TimeoutManager] INFO [SYSTEM MONITOR] {"os":{"memory":{"physical":{"free":2638712832,"total":17037193216},"swap_space":{"free":47908810752,"total":51539607552}},"cpu":{"percent":7},"load_averages":[0.52, 0.58, 0.59]},"process":{"file_descriptor":{"open":416,"max":65535},"cpu":{"percent":0,"total":10230},"virtual_memory":{"total":760116297728}},"jvm":{"memory":{"heap":{"used":177809344,"committed":257490944,"max":518979584,"percent":34},"non_heap":{"used":81122128,"committed":84885504}},"pools":{"direct":{"count":46,"used":86392833,"capacity":86392832},"mapped":{"count":0,"used":0,"capacity":0}},"gc":{"young":{"count":10,"time":161},"old":{"count":2,"time":15}},"threads":{"count":84,"peak":84},"classes":{"loaded":10448,"total_loaded":10451,"unloaded":3},"uptime":68732},"elasticsearch":null,"timestamp":1546911113073}
2019-01-08 01:31:53,335 [IndexUpdater] INFO Processing no docs (Doc:{access 1ms}, Mem:{used 174MB, heap 245MB, max 494MB})
2019-01-08 01:32:03,336 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms}, Mem:{used 174MB, heap 245MB, max 494MB})
2019-01-08 01:32:13,336 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms}, Mem:{used 174MB, heap 245MB, max 494MB})
2019-01-08 01:32:23,336 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms}, Mem:{used 174MB, heap 245MB, max 494MB})
2019-01-08 01:32:33,335 [IndexUpdater] INFO Processing no docs (Doc:{access 1ms}, Mem:{used 174MB, heap 245MB, max 494MB})
2019-01-08 01:32:43,335 [IndexUpdater] INFO Processing no docs (Doc:{access 1ms}, Mem:{used 174MB, heap 245MB, max 494MB})
2019-01-08 01:32:53,134 [CoreLib-TimeoutManager] INFO [SYSTEM MONITOR] {"os":{"memory":{"physical":{"free":2627436544,"total":17037193216},"swap_space":{"free":47908810752,"total":51539607552}},"cpu":{"percent":3},"load_averages":[0.52, 0.58, 0.59]},"process":{"file_descriptor":{"open":416,"max":65535},"cpu":{"percent":0,"total":10420},"virtual_memory":{"total":760116297728}},"jvm":{"memory":{"heap":{"used":183159912,"committed":257490944,"max":518979584,"percent":35},"non_heap":{"used":82510368,"committed":86523904}},"pools":{"direct":{"count":56,"used":87048193,"capacity":87048192},"mapped":{"count":0,"used":0,"capacity":0}},"gc":{"young":{"count":10,"time":161},"old":{"count":2,"time":15}},"threads":{"count":84,"peak":84},"classes":{"loaded":10593,"total_loaded":10596,"unloaded":3},"uptime":128833},"elasticsearch":null,"timestamp":1546911173134}
2019-01-08 01:32:53,336 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms}, Mem:{used 175MB, heap 245MB, max 494MB})
2019-01-08 01:33:03,336 [IndexUpdater] INFO Processing no docs (Doc:{access 1ms}, Mem:{used 175MB, heap 245MB, max 494MB})
2019-01-08 01:33:13,335 [IndexUpdater] INFO Processing no docs (Doc:{access 1ms}, Mem:{used 175MB, heap 245MB, max 494MB})
2019-01-08 01:33:23,336 [IndexUpdater] INFO Processing no docs (Doc:{access 1ms}, Mem:{used 175MB, heap 245MB, max 494MB})
2019-01-08 01:33:33,337 [IndexUpdater] INFO Processing no docs (Doc:{access 1ms}, Mem:{used 175MB, heap 245MB, max 494MB})
2019-01-08 01:33:43,338 [IndexUpdater] INFO Processing no docs (Doc:{access 1ms}, Mem:{used 175MB, heap 245MB, max 494MB})
2019-01-08 01:33:53,243 [CoreLib-TimeoutManager] INFO [SYSTEM MONITOR] {"os":{"memory":{"physical":{"free":2622480384,"total":17037193216},"swap_space":{"free":47908810752,"total":51539607552}},"cpu":{"percent":3},"load_averages":[0.52, 0.58, 0.59]},"process":{"file_descriptor":{"open":416,"max":65535},"cpu":{"percent":0,"total":10520},"virtual_memory":{"total":760116297728}},"jvm":{"memory":{"heap":{"used":184231528,"committed":257490944,"max":518979584,"percent":35},"non_heap":{"used":82897872,"committed":86851584}},"pools":{"direct":{"count":56,"used":87048193,"capacity":87048192},"mapped":{"count":0,"used":0,"capacity":0}},"gc":{"young":{"count":10,"time":161},"old":{"count":2,"time":15}},"threads":{"count":84,"peak":84},"classes":{"loaded":10593,"total_loaded":10596,"unloaded":3},"uptime":188895},"elasticsearch":null,"timestamp":1546911233243}
2019-01-08 01:33:53,339 [IndexUpdater] INFO Processing no docs (Doc:{access 1ms}, Mem:{used 125MB, heap 245MB, max 494MB})
2019-01-08 01:34:03,344 [IndexUpdater] INFO Processing no docs (Doc:{access 5ms}, Mem:{used 129MB, heap 245MB, max 494MB})
2019-01-08 01:34:13,339 [IndexUpdater] INFO Processing no docs (Doc:{access 1ms}, Mem:{used 129MB, heap 245MB, max 494MB})
2019-01-08 01:34:23,340 [IndexUpdater] INFO Processing no docs (Doc:{access 1ms}, Mem:{used 129MB, heap 245MB, max 494MB})
2019-01-08 01:34:23,995 [WebFsCrawler] INFO [EXEC TIME] crawling time: 210820ms
2019-01-08 01:34:33,341 [IndexUpdater] INFO Processing no docs (Doc:{access 1ms}, Mem:{used 129MB, heap 245MB, max 494MB})
2019-01-08 01:34:33,341 [IndexUpdater] INFO [EXEC TIME] index update time: 56ms
2019-01-08 01:34:33,376 [main] INFO Finished Crawler
2019-01-08 01:34:33,459 [main] INFO [CRAWL INFO] DataCrawlEndTime=2019-01-08T01:30:53.164+0000,CrawlerEndTime=2019-01-08T01:34:33.376+0000,WebFsCrawlExecTime=210820,CrawlerStatus=true,CrawlerStartTime=2019-01-08T01:30:53.140+0000,WebFsCrawlEndTime=2019-01-08T01:34:33.375+0000,WebFsIndexExecTime=56,WebFsIndexSize=0,CrawlerExecTime=220236,DataCrawlStartTime=2019-01-08T01:30:53.156+0000,WebFsCrawlStartTime=2019-01-08T01:30:53.155+0000
2019-01-08 01:34:33,470 [main] INFO Disconnected to elasticsearch:localhost:9300
2019-01-08 01:34:34,470 [main] INFO Destroyed LaContainer.