ES 6.1.1 & FESS 12.01
This instance of FESS has been running for a couple weeks now, with a bunch of data crawled & indexed. The FESS temp dir is >35GB and still growing. In particular, I see lots of files with an extension of .out
Does FESS automatically expire and remove temp files, or do I need to do something about this?
Need more info… What are file names? Does Fess run on Windows?
Fess removes temp files automatically at the end of crawling.
Yes, FESS is on Windows Server 2012 R2.
Here is an example of the temp files
There are 19,000+ of these files, ranging in size from 1MB to 10MB.
There are no crawlers currently running.
What is a name of the temp directory? Is it fessTmpDir_XXXX?
Did you check fess.log and fess-crawler.log?
No, the temp dir name is: fess-12.0.1/temp
I don’t see anything relevant in the logs.
Here is a typical set of logs when the crawler finishes:
2018-02-14 02:59:13,234 [main] INFO Finished Crawler
2018-02-14 02:59:13,550 [main] INFO [CRAWL INFO] CrawlerEndTime=2018-02-14T02:59:13.286+0000,WebFsCrawlExecTime=86273845,CrawlerStatus=false,CrawlerStartTime=2018-02-13T03:01:19.229+0000,WebFsCrawlEndTime=2018-02-14T02:59:13.230+0000,CrawlerErrors=QueueTimeout,WebFsIndexExecTime=1014447,WebFsIndexSize=292,CrawlerExecTime=86274057,WebFsCrawlStartTime=2018-02-13T03:01:19.304+0000
2018-02-14 02:59:24,441 [main] INFO Disconnected to elasticsearch:xxxxxx:9300
2018-02-14 02:59:34,955 [main] INFO Destroyed LaContainer.
Hmm… Crawler creates temp files in fessTmpDir_XXXX.
Did you use thumbnails feature.
Thumbnail view is not enabled under System --> General. However, I see lines like this in FESS.log:
2018-02-13 00:30:28,565 [job_thumbnail_generate] WARN Failed to create thumbnail: xxxxxx
The thumbnail generator & purger jobs are active.
Digging into these temp files more (crawler-HcHttpClient-3005486111751970959.out), it appears they are images, as there is an exif header. Indeed, renaming a couple of these files to .jpg allows me to open it and see that they are images from a crawled site.
Any ideas on why these are not being deleted after the crawler is finished?
What are your crawling configurations? Do you crawl documents on Web?
I also have same situation that temporary files remain.
(On Windows Server 2012 R2, fess11.3.0)
CrawlerErrors are not happend.
I think when crawling takes more than one day, these files remain…
In Fess 12.1, thumbnail generator was refactored.
So, it may be fixed in 12.1.