Remove urls from index

discuss · January 15, 2018, 1:54pm

(from github.com/hottwerk)
Is it possible to remove already crawled urls from a index based on a regex?
lets say: delete all .*id_categorie=2.*

Or maybe filter them when search display?

discuss · January 15, 2018, 2:21pm

(from github.com/marevol)
You can remove it on Admin Search page.

discuss · January 15, 2018, 4:44pm

(from github.com/hottwerk)
Yes, but is it possible to search url’s based on regex? If so could you give a small example?

discuss · January 15, 2018, 9:19pm

url:"http://fess.codelibs.org/*"

discuss · January 16, 2018, 8:55am

(from github.com/hottwerk)
Thank you very much!
url:“http://fess.codelibs.org/*” works!
url:“http://fess.codelibs.org/1*/install/*” doesn’t work. Can this be done?

Another question, how to escape characters in the search string? (I searching for a * in the url )

discuss · January 18, 2018, 12:26am

url:“http://fess.codelibs.org/1*/install/*” doesn’t work.

It will be supported in a next release.

how to escape characters in the search string?

\*

discuss · January 18, 2018, 3:23am

url:“http://fess.codelibs.org/1*/install/*” doesn’t work.

It will be supported in a next release.

Oops, it’s not correct…
url:“…” is a phrase query with supporting a prefix query.
To use prefix/wildcard query, the query is:

url:http\:\/\/fess.codelibs.org\/*\/12.0\/*

discuss · January 19, 2018, 9:00am

(from github.com/hottwerk)
That seems to work…better. Not totally like I would expect but now I can figure it out now.
Thank you!