generate thumbnail only for first page

(from github.com/freestyle68)
Hi again,

I noticed that the generate-thumbnail script make thumbnail of every page of the pdf/office documents.
You can see it in /var/lib/fess/thumbnails/

This is useless because only the first page is supposed to be showed on thumbnails results. And this kill the server, imagine a 300 page document… in fact gs kill the server with this task:

gs -sstdout=%stderr -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 …

As a solution, you should add the flag

-e PageRange=1-1

to the commandline

unoconv -o $TMP_FILE -f pdf $TARGET_FILE

More details on

http://bernaerts.dyndns.org/linux/76-gnome/325-gnome-shell-generate-msoffice-thumbnail-nautilus

Another suggested feature is to add thumbnail also for image files, you could use

mogrify for jpg, png and other formats:

(from github.com/marevol)
Thank you for the info.
I’ll add PageRange.

(from github.com/freestyle68)
Hi,

even after fix https://github.com/codelibs/fess/issues/1168 still PNGs for every document page are generated in /var/lib/fess/thumbnails/

I tried the script

/usr/share/fess/bin/generate-thumbnail msoffice <MS Office file> <Output Path>

for office and pdf files and this correctly generate a single pdf or png of the first page.

But during fess crawling the process start:

gs -sstdout=%stderr -dQUIET … -sOutputfile/tmp-magik-…

This is a 100% CPU process that last a lot of time.
And then start the process:

convert -thumbnail

and then again gs. And so on back and forth.
A lot of time for a few dozen files.

Also, after this procedure a single thumbnail is visible on search results (of about 60 files crawled).
If I use the chrome developer tool to analize the missing thumbnails, the result is the following:

<div class="thumbnailBox media-left hidden-xs-down">
<a class="link" href="file://dati/few/064/064452.pdf" data-uri="file://dati/few/064/064452.pdf" data-id="22d2c3a975f049728042aacc6bfe9bb7" data-order="0">
<img src="/images/noimage.png" data-src="/thumbnail/?docId=22d2c3a975f049728042aacc6bfe9bb7&amp;queryId=8919116e193244ae8120a7cd365784fb" class="thumbnail" style="background-image: url(&quot;/images/loading.gif&quot;);">

so the img src is not pointing to image in /var/lib/fess/thumbnails/

In var/log/fess there are a lot of similar lines:

[CommandGeneratorDestoryTimer-1500312534655] WARN CommandGenerator is timed out: [/usr/share/fess/bin/generate-thumbnail, pdf, /var/tmp/fess/thumbnail_3764418062551063326, /var/lib/fess/thumbnails/22d2c/3a975/f0497/28042/aacc6/bfe9b/b7.png]

(from github.com/marevol)
What is the server hardware spec?

(from github.com/freestyle68)
Xeon with 16 GB RAM. Debian 9.
Also tried with Centos 7, same results.

The gs process put a single thread to 100% CPU, not all threads.

Crawled now 27 files (docs and pdfs) and after 15 min still active Thumbnail generator job. No thumbnail visible on search results.

Same errors in /var/log/fess

Tried also with a new from scratch VPS, Debian 9, OpenJDK, 16 GB, but same problem.

(from github.com/marevol)
Thank you for the info.
I think it’s improved by #1173.

(from github.com/freestyle68)
YES, now the generate-thumbnail task is very fast and the thumbnails are visible on the search results.

But only for PDF files.
For office files thumbnails aren’t visible.

This is not due to missing thumbnail but to a web config problem: in fact with office docs I cannot see the blank image (img src="/images/noimage.png) usually visible before generate thumbnail job.
This image instead is visible with PDF files before thumbnail job.

(from github.com/freestyle68)
Perfect! The fix https://github.com/codelibs/fess/issues/1175 solved the problem.

I tried crawling about 6000 docs and I have no problem.
But I have changed the timeout from 10 to 30 sec on

src/main/java/org/codelibs/fess/thumbnail/impl/CommandGenerator.java

because a lot of

[CommandGeneratorDestoryTimer-1500312534655] WARN CommandGenerator is timed out

errors (slow disk). This caused the thumbnail job to abort and restart a few minutes later.
Thank you very much.