FESS Temp files...

(from github.com/defensivedepth)
ES 6.1.1 & FESS 12.01

This instance of FESS has been running for a couple weeks now, with a bunch of data crawled & indexed. The FESS temp dir is >35GB and still growing. In particular, I see lots of files with an extension of .out

Does FESS automatically expire and remove temp files, or do I need to do something about this?

(from github.com/marevol)
Need more info… What are file names? Does Fess run on Windows?
Fess removes temp files automatically at the end of crawling.

(from github.com/defensivedepth)
Yes, FESS is on Windows Server 2012 R2.

Here is an example of the temp files


There are 19,000+ of these files, ranging in size from 1MB to 10MB.

There are no crawlers currently running.

(from github.com/marevol)
What is a name of the temp directory? Is it fessTmpDir_XXXX?
Did you check fess.log and fess-crawler.log?

(from github.com/defensivedepth)
No, the temp dir name is: fess-12.0.1/temp

I don’t see anything relevant in the logs.

Here is a typical set of logs when the crawler finishes:

2018-02-14 02:59:13,234 [main] INFO Finished Crawler
2018-02-14 02:59:13,550 [main] INFO [CRAWL INFO] CrawlerEndTime=2018-02-14T02:59:13.286+0000,WebFsCrawlExecTime=86273845,CrawlerStatus=false,CrawlerStartTime=2018-02-13T03:01:19.229+0000,WebFsCrawlEndTime=2018-02-14T02:59:13.230+0000,CrawlerErrors=QueueTimeout,WebFsIndexExecTime=1014447,WebFsIndexSize=292,CrawlerExecTime=86274057,WebFsCrawlStartTime=2018-02-13T03:01:19.304+0000
2018-02-14 02:59:24,441 [main] INFO Disconnected to elasticsearch:xxxxxx:9300
2018-02-14 02:59:34,955 [main] INFO Destroyed LaContainer.

(from github.com/marevol)
Hmm… Crawler creates temp files in fessTmpDir_XXXX.
Did you use thumbnails feature.

(from github.com/defensivedepth)
Thumbnail view is not enabled under System --> General. However, I see lines like this in FESS.log:

2018-02-13 00:30:28,565 [job_thumbnail_generate] WARN Failed to create thumbnail: xxxxxx

The thumbnail generator & purger jobs are active.

Digging into these temp files more (crawler-HcHttpClient-3005486111751970959.out), it appears they are images, as there is an exif header. Indeed, renaming a couple of these files to .jpg allows me to open it and see that they are images from a crawled site.

Any ideas on why these are not being deleted after the crawler is finished?

(from github.com/marevol)
What are your crawling configurations? Do you crawl documents on Web?

(from github.com/hideki0413)
I also have same situation that temporary files remain.
For example:
(On Windows Server 2012 R2, fess11.3.0)

CrawlerErrors are not happend.
I think when crawling takes more than one day, these files remain…

(from github.com/marevol)
In Fess 12.1, thumbnail generator was refactored.
So, it may be fixed in 12.1.