I noticed that the generate-thumbnail script make thumbnail of every page of the pdf/office documents.
You can see it in /var/lib/fess/thumbnails/
This is useless because only the first page is supposed to be showed on thumbnails results. And this kill the server, imagine a 300 page document… in fact gs kill the server with this task:
gs -sstdout=%stderr -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 …
As a solution, you should add the flag
to the commandline
unoconv -o $TMP_FILE -f pdf $TARGET_FILE
More details on
Another suggested feature is to add thumbnail also for image files, you could use
mogrify for jpg, png and other formats:
Thank you for the info.
I’ll add PageRange.
even after fix https://github.com/codelibs/fess/issues/1168 still PNGs for every document page are generated in /var/lib/fess/thumbnails/
I tried the script
/usr/share/fess/bin/generate-thumbnail msoffice <MS Office file> <Output Path>
for office and pdf files and this correctly generate a single pdf or png of the first page.
But during fess crawling the process start:
gs -sstdout=%stderr -dQUIET … -sOutputfile/tmp-magik-…
This is a 100% CPU process that last a lot of time.
And then start the process:
and then again gs. And so on back and forth.
A lot of time for a few dozen files.
Also, after this procedure a single thumbnail is visible on search results (of about 60 files crawled).
If I use the chrome developer tool to analize the missing thumbnails, the result is the following:
<div class="thumbnailBox media-left hidden-xs-down">
<a class="link" href="file://dati/few/064/064452.pdf" data-uri="file://dati/few/064/064452.pdf" data-id="22d2c3a975f049728042aacc6bfe9bb7" data-order="0">
<img src="/images/noimage.png" data-src="/thumbnail/?docId=22d2c3a975f049728042aacc6bfe9bb7&queryId=8919116e193244ae8120a7cd365784fb" class="thumbnail" style="background-image: url("/images/loading.gif");">
so the img src is not pointing to image in /var/lib/fess/thumbnails/
In var/log/fess there are a lot of similar lines:
[CommandGeneratorDestoryTimer-1500312534655] WARN CommandGenerator is timed out: [/usr/share/fess/bin/generate-thumbnail, pdf, /var/tmp/fess/thumbnail_3764418062551063326, /var/lib/fess/thumbnails/22d2c/3a975/f0497/28042/aacc6/bfe9b/b7.png]
What is the server hardware spec?
Xeon with 16 GB RAM. Debian 9.
Also tried with Centos 7, same results.
The gs process put a single thread to 100% CPU, not all threads.
Crawled now 27 files (docs and pdfs) and after 15 min still active Thumbnail generator job. No thumbnail visible on search results.
Same errors in /var/log/fess
Tried also with a new from scratch VPS, Debian 9, OpenJDK, 16 GB, but same problem.
Thank you for the info.
I think it’s improved by #1173.
YES, now the generate-thumbnail task is very fast and the thumbnails are visible on the search results.
But only for PDF files.
For office files thumbnails aren’t visible.
This is not due to missing thumbnail but to a web config problem: in fact with office docs I cannot see the blank image (img src="/images/noimage.png) usually visible before generate thumbnail job.
This image instead is visible with PDF files before thumbnail job.
Perfect! The fix https://github.com/codelibs/fess/issues/1175 solved the problem.
I tried crawling about 6000 docs and I have no problem.
But I have changed the timeout from 10 to 30 sec on
because a lot of
[CommandGeneratorDestoryTimer-1500312534655] WARN CommandGenerator is timed out
errors (slow disk). This caused the thumbnail job to abort and restart a few minutes later.
Thank you very much.