Creating a separate Job Schedule for a single crawler

Hi guys, this is my first post and I would like to say hello to everyone and I hope I can contribute as well as get some help here on this forum, FESS is awesome!

I have created a Web Crawler called Bulletin Board and I want to schedule a separate job schedule to run this every 30mins,

Can you tell me what the settings in the Jobs Schedular for Target and the Script should be to run the crawler called Bulletin Board every 30 mins

I have searched the forum and people have answered similar questions but they don’t give an explanation on what to do.

Thanks in advance for any help

You can create a crawler job from “Create new job” button on Web Config details page.
Schedule is a cron format. ex 0,30 * * * *

Hi Shinsuke thank you so much for quick reply, can you show me example of what the target name should be and also what the config Script should be to schedule a web crawler called “Bulletin Board” many thanks

Ok after doing some research, I have managed to find a working script for job scheduler, enter the ID string of the crawler into the crawler type

Example

  • Name : File Crawler - My name
  • Target: all
  • Schedule : * * * * *
  • Executor : groovy
  • Script : return container.getComponent(“crawlJob”).logLevel(“info”).sessionId(“ENTER THE CRAWLER ID STRING HERE”).webConfigIds([“ENTER THE CRAWLER ID STRING HERE”] as String[]).fileConfigIds([“ENTER THE CRAWLER ID STRING HERE”] as String[]).dataConfigIds([“ENTER THE CRAWLER ID STRING HERE”] as String[]).jobExecutor(executor).execute();
  • Logging : Enabled
  • Crawler Job : Enabled
  • Status : Enabled
  • Display Order: 0

You only need to enter the Stringid in sessionid and in the specific type of crawler brackets otherwise leave empty… remove “ENTER THE CRAWLER ID STRING HERE” and only leave the brackets.

Hope that helps someone

Hi Members or Menzer,

thank you very much for your example. I try the same but with a file based crawler. Three crawlers shall run on different times and different often. Currently with Default Crawler it works. Then I tried to use a new job called

Name Scaneingang
Ziel all
Zeitplan * * * * *
Executor groovy
Skript return container.getComponent("crawlJob”).logLevel(“info”).sessionId(PLsQcncBOjLknvXw1B9R).webConfigIds([] as String[]).fileConfigIds([PLsQcncBOjLknvXw1B9R] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute();
Protokollierung Aktiviert
Crawler-Protokoll Aktiviert
Status Aktiviert
Sortierreihenfolge 0

and a new Crawler

ID PLsQcncBOjLknvXw1B9R
Name Scaneingang
Pfade smb://10.0.0.27/nas/Eigene_Dateien/Scaneingang/
Eingeschlossene Pfade beim Crawling
Ausgeschlossene Pfade beim Crawling
Eingeschlossene Pfade bei der Indizierung
Ausgeschlossene Pfade bei der Indizierung
Konfigurationsparameter
Tiefe
Max. Anzahl an Zugriffen
Thread-Anzahl 3
Zeitintervall 1000 ms
Boost 1.0
Berechtigungen {role}guest
Virtuelle Hosts
Status Aktiviert
Beschreibung

But when I start the Scaneingang Job I get the following error message:

at java.lang.Thread.run(Thread.java:834) [?:?]

2021-02-05 19:46:00,396 [job_67sTcncBOjLknvXwpx-7] WARN Failed to evalue groovy script: return container.getComponent("crawlJob”).logLevel(“info”).sessionId(PLsQcncBOjLknvXw1B9R).webConfigIds([] as String[]).fileConfigIds([PLsQcncBOjLknvXw1B9R] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute(); => {executor=org.codelibs.fess.job.impl.GroovyExecutor@400bc2b0}
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
Script1.groovy: 1: Unexpected input: ‘(’ @ line 1, column 30.
return container.getComponent("crawlJob”).logLevel(“info”).sessionId(PLsQcncBOjLknvXw1B9R).webConfigIds([] as String[]).fileConfigIds([PLsQcncBOjLknvXw1B9R] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute();

1 error

Something seems to be wrong here:
container.getComponent → (" <—

Do you have any idea what I’m doing wrong?

Thanx a lot for your help.

Kind regards,
TheRedDevil

PLsQcncBOjLknvXw1B9R → “PLsQcncBOjLknvXw1B9R”

Hello Shinsuke,
thank you very much for your support. I also tried this but I still get the same error :frowning:

Job:

Details
Name Scaneingang
Ziel all
Zeitplan * * * * *
Executor groovy
Skript return container.getComponent("crawlJob”).logLevel(“info”).sessionId(“PLsQcncBOjLknvXw1B9R”).webConfigIds([] as String[]).fileConfigIds([“PLsQcncBOjLknvXw1B9R”] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute();
Protokollierung Aktiviert
Crawler-Protokoll Aktiviert
Status Aktiviert
Sortierreihenfolge 0

Error:

at java.lang.Thread.run(Thread.java:834) [?:?]
2021-02-06 07:27:00,327 [job_67sTcncBOjLknvXwpx-7] WARN Failed to evalue groovy script:
return container.getComponent("crawlJob”).logLevel(“info”).sessionId(“PLsQcncBOjLknvXw1B9R”).webConfigIds([] as String[]).fileConfigIds([“PLsQcncBOjLknvXw1B9R”] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute(); => {executor=org.codelibs.fess.job.impl.GroovyExecutor@ae7b522}
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
Script1.groovy: 1: Unexpected input: ‘(’ @ line 1, column 30.
return container.getComponent("crawlJob”).logLevel(“info”).sessionId(“PLsQcncBOjLknvXw1B9R”).webConfigIds([] as String[]).fileConfigIds([“PLsQcncBOjLknvXw1B9R”] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute();
1 error

I renamed the Default Crawler Job once before and than back to Default Crawler. Can this be the reason?

“ → "

You use an invalid double quotation.

Wow, you are great! Thank you very much!!!

With your " it works. I just copied your " into Fess input fields and it works.
Unbelievable mistake, I never had something like this before. Reason could be, that I sometimes used the Smartphone with Android Firefox and AnySoftKeyboard APP instead of Ubuntu Firefox.

P.S: Fess is a great solution for my Ubuntu based NAS. Now my family can find and open all documents on every network attached device. I scan all documents incl. OCR - Fess is a great help for a paperless office.

Kind regards to Japan from Germany,
TheRedDevil

Final solution for the other memers:

Name Scaneingang
Ziel all
Zeitplan * * * * *
Executor groovy
Skript return container.getComponent(“crawlJob”).logLevel(“info”).sessionId(“PLsQcncBOjLknvXw1B9R”).webConfigIds([] as String[]).fileConfigIds([“PLsQcncBOjLknvXw1B9R”] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute();
Protokollierung Aktiviert
Crawler-Protokoll Aktiviert
Status Aktiviert
Sortierreihenfolge 0