Creating a separate Job Schedule for a single crawler

menzer · September 25, 2020, 8:19am

Hi guys, this is my first post and I would like to say hello to everyone and I hope I can contribute as well as get some help here on this forum, FESS is awesome!

I have created a Web Crawler called Bulletin Board and I want to schedule a separate job schedule to run this every 30mins,

Can you tell me what the settings in the Jobs Schedular for Target and the Script should be to run the crawler called Bulletin Board every 30 mins

I have searched the forum and people have answered similar questions but they don’t give an explanation on what to do.

Thanks in advance for any help

shinsuke · September 25, 2020, 12:26pm

You can create a crawler job from “Create new job” button on Web Config details page.
Schedule is a cron format. ex 0,30 * * * *

menzer · September 25, 2020, 4:04pm

Hi Shinsuke thank you so much for quick reply, can you show me example of what the target name should be and also what the config Script should be to schedule a web crawler called “Bulletin Board” many thanks

menzer · September 25, 2020, 7:28pm

Ok after doing some research, I have managed to find a working script for job scheduler, enter the ID string of the crawler into the crawler type

Example

Name : File Crawler - My name
Target: all
Schedule : * * * * *
Executor : groovy
Script : return container.getComponent(“crawlJob”).logLevel(“info”).sessionId(“ENTER THE CRAWLER ID STRING HERE”).webConfigIds([“ENTER THE CRAWLER ID STRING HERE”] as String[]).fileConfigIds([“ENTER THE CRAWLER ID STRING HERE”] as String[]).dataConfigIds([“ENTER THE CRAWLER ID STRING HERE”] as String[]).jobExecutor(executor).execute();
Logging : Enabled
Crawler Job : Enabled
Status : Enabled
Display Order: 0

You only need to enter the Stringid in sessionid and in the specific type of crawler brackets otherwise leave empty… remove “ENTER THE CRAWLER ID STRING HERE” and only leave the brackets.

Hope that helps someone

thereddevil · February 5, 2021, 6:54pm

Hi Members or Menzer,

thank you very much for your example. I try the same but with a file based crawler. Three crawlers shall run on different times and different often. Currently with Default Crawler it works. Then I tried to use a new job called

Name	Scaneingang
Ziel	all
Zeitplan	* * * * *
Executor	groovy
Skript	return container.getComponent("crawlJob”).logLevel(“info”).sessionId(PLsQcncBOjLknvXw1B9R).webConfigIds([] as String[]).fileConfigIds([PLsQcncBOjLknvXw1B9R] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute();
Protokollierung	Aktiviert
Crawler-Protokoll	Aktiviert
Status	Aktiviert
Sortierreihenfolge	0

and a new Crawler

ID	PLsQcncBOjLknvXw1B9R
Name	Scaneingang
Pfade	smb://10.0.0.27/nas/Eigene_Dateien/Scaneingang/
Eingeschlossene Pfade beim Crawling
Ausgeschlossene Pfade beim Crawling
Eingeschlossene Pfade bei der Indizierung
Ausgeschlossene Pfade bei der Indizierung
Konfigurationsparameter
Tiefe
Max. Anzahl an Zugriffen
Thread-Anzahl	3
Zeitintervall	1000 ms
Boost	1.0
Berechtigungen	{role}guest
Virtuelle Hosts
Status	Aktiviert
Beschreibung

But when I start the Scaneingang Job I get the following error message:

at java.lang.Thread.run(Thread.java:834) [?:?]

2021-02-05 19:46:00,396 [job_67sTcncBOjLknvXwpx-7] WARN Failed to evalue groovy script: return container.getComponent("crawlJob”).logLevel(“info”).sessionId(PLsQcncBOjLknvXw1B9R).webConfigIds([] as String[]).fileConfigIds([PLsQcncBOjLknvXw1B9R] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute(); => {executor=org.codelibs.fess.job.impl.GroovyExecutor@400bc2b0}
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
Script1.groovy: 1: Unexpected input: ‘(’ @ line 1, column 30.
return container.getComponent("crawlJob”).logLevel(“info”).sessionId(PLsQcncBOjLknvXw1B9R).webConfigIds([] as String[]).fileConfigIds([PLsQcncBOjLknvXw1B9R] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute();

1 error

Something seems to be wrong here:
container.getComponent → (" <—

Do you have any idea what I’m doing wrong?

Thanx a lot for your help.

Kind regards,
TheRedDevil

shinsuke · February 5, 2021, 11:52pm

PLsQcncBOjLknvXw1B9R → “PLsQcncBOjLknvXw1B9R”

thereddevil · February 6, 2021, 6:32am

Hello Shinsuke,
thank you very much for your support. I also tried this but I still get the same error

Job:

Details
Name Scaneingang
Ziel all
Zeitplan * * * * *
Executor groovy
Skript return container.getComponent("crawlJob”).logLevel(“info”).sessionId(“PLsQcncBOjLknvXw1B9R”).webConfigIds([] as String[]).fileConfigIds([“PLsQcncBOjLknvXw1B9R”] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute();
Protokollierung Aktiviert
Crawler-Protokoll Aktiviert
Status Aktiviert
Sortierreihenfolge 0

Error:

at java.lang.Thread.run(Thread.java:834) [?:?]
2021-02-06 07:27:00,327 [job_67sTcncBOjLknvXwpx-7] WARN Failed to evalue groovy script:
return container.getComponent("crawlJob”).logLevel(“info”).sessionId(“PLsQcncBOjLknvXw1B9R”).webConfigIds([] as String[]).fileConfigIds([“PLsQcncBOjLknvXw1B9R”] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute(); => {executor=org.codelibs.fess.job.impl.GroovyExecutor@ae7b522}
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
Script1.groovy: 1: Unexpected input: ‘(’ @ line 1, column 30.
return container.getComponent("crawlJob”).logLevel(“info”).sessionId(“PLsQcncBOjLknvXw1B9R”).webConfigIds([] as String[]).fileConfigIds([“PLsQcncBOjLknvXw1B9R”] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute();
1 error

I renamed the Default Crawler Job once before and than back to Default Crawler. Can this be the reason?

shinsuke · February 6, 2021, 10:12am

“ → "

You use an invalid double quotation.

thereddevil · February 6, 2021, 10:36am

Wow, you are great! Thank you very much!!!

With your " it works. I just copied your " into Fess input fields and it works.
Unbelievable mistake, I never had something like this before. Reason could be, that I sometimes used the Smartphone with Android Firefox and AnySoftKeyboard APP instead of Ubuntu Firefox.

P.S: Fess is a great solution for my Ubuntu based NAS. Now my family can find and open all documents on every network attached device. I scan all documents incl. OCR - Fess is a great help for a paperless office.

Kind regards to Japan from Germany,
TheRedDevil

Final solution for the other memers:

Name	Scaneingang
Ziel	all
Zeitplan	* * * * *
Executor	groovy
Skript	return container.getComponent(“crawlJob”).logLevel(“info”).sessionId(“PLsQcncBOjLknvXw1B9R”).webConfigIds([] as String[]).fileConfigIds([“PLsQcncBOjLknvXw1B9R”] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute();
Protokollierung	Aktiviert
Crawler-Protokoll	Aktiviert
Status	Aktiviert
Sortierreihenfolge	0