Hi, I have a doubt. If fess stops in the middle os a indexing task. It will begin all over again? I ask becouse I have 1.500.000 files and it was indexed 500.000, but we had to start it again and it seems that its taking the same time to go to the point where it stopped.
Fess will run all over the files untill the stop point? Shouldnt at least it be faster to go until that çont? or he has to calculate hash and other things of each file? Will it always take that amount of time every indexing task? Can it be improved ?
If you want to continue a crawler, you need to add a sessionId() method to the script of the Default Crawler job as below. The string argument can be anything.
I dont have a default craweler, I have two separated, one to filesystem, other to datastore, they are configured as below:
return container.getComponent("crawlJob").logLevel("info").sessionId("4EnG5IkBVq0bdBbnzIVl").webConfigIds([] as String[]).fileConfigIds(["4EnG5IkBVq0bdBbnzIVl"] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute();
Although I cannot understand what you want to do, a crawling target of the second crawler contains the first one. So, it’s better for the second crawler to exclude 4EnG5IkBVq0bdBbnzIVl.
return container.getComponent("crawlJob").logLevel("info").sessionId("4EnG5IkBVq0bdBbnzIVl").webConfigIds([] as String[]).fileConfigIds(["4EnG5IkBVq0bdBbnzIVl"] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute();
I was crawling SETORES and got 543.000 indexed files, 1.000.000 crawling and more 400.000 in queue. Then we had a power failure and I had to start those jobs again. But instead it continue from the break point, it looks like started all over again, because indexed files doesnt grow, and looks like its deleting files. Shouldnt it just continue from where it stoped indexing? Its almost three days that anything is indexed, its just going throu all files again, I guess.
And what should I delete from crawler job? I didnt understand.
The above setting contains the filesystem crawling. To avoid it, you need to add dataConfigIds() to specify the datastore crawling.
You can create a crawler job from the datastore crawling page.
return container.getComponent("crawlJob").logLevel("info").sessionId("LnDEA4oBvxD_9Z3XqUm_").webConfigIds([] as String[]).fileConfigIds(["LnDEA4oBvxD_9Z3XqUm_"] as String[]).dataConfigIds([] as String[]).jobExecutor(executor).execute();
And what about the continue, souldnt it continue crawling from where it stopped ? Should I configure something ?
return container.getComponent("crawlJob").logLevel("info").sessionId("LnDEA4oBvxD_9Z3XqUm_").dataConfigIds(["LnDEA4oBvxD_9Z3XqUm_"] as String[]).jobExecutor(executor).execute();
If a crawler with a session ID stops, it restarts with the session data.