Fess takes the crawling date as document creation date. Is there a way to extract the real publishing date for correct filtering in the search?
As least, if something like
<meta property="article:published_time" content="2024-07-23T14:55:42+00:00">
is present?
Yes. Please see previous topics.
Thank you, I found the answer there:
Add the following to the web crawling configuration under config params, works for Wordpress sites:
field.xpath.lastModified=//META[@property="article:published_time"]/@content
Looks like this is not the solution:
field.xpath.lastModified=//META[@property="article:published_time"]/@content
Looks like timestamp is the field which stores the date shown on the search result site. But when I do
field.xpath.timestamp=//META[@property="article:published_time"]/@content
then the field contains an array with two dates, the one from the page and the crawling date, showing crawing date on the search results. Also this is not helping:
field.overwrite.timestamp=true
Please help!
This topic has to do with that:
It would be more appropriate to use a custom field that you create rather than a system field like a timestamp.
Ok, I created a new field named doc_created and managed to fill in the real creation dates of one test domain.
I managed to show this date in the search results page by editing the searchResults.jsp
<c:choose>
<c:when test="${doc.doc_created != null}">
<fmt:formatDate value="${fe:parseDate(doc.doc_created)}" type="BOTH" pattern="yyyy-MM-dd HH:mm" />
</c:when>
<c:otherwise>
<fmt:formatDate value="${fe:parseDate(doc.created)}" type="BOTH" pattern="yyyy-MM-dd HH:mm" />
</c:otherwise>
</c:choose>
But WHERE can I define that this field should be used when I want to SORT the results by creation date?
<request>
...
<sort>
<script type="number" order="desc">
<![CDATA[
if (doc.containsKey('doc_created') && !doc['doc_created'].empty) {
return doc['doc_created'].value.getMillis();
} else {
return doc['last_modified'].value.getMillis();
}
]]>
</script>
</sort>
...
</request>
To use a sort field, add the field definition to the index and add it to query.additional.sort.fields.
I tried this already, without success.
In DEV TOOLS:
GET /fess.search/_mapping?pretty
…
“doc_created”: {
“type”: “date”
},
…
Content of /etc/fess/fess_config.properties :
…
query.additional.default.fields=
query.additional.response.fields=doc_created
query.additional.api.response.fields=
query.additional.scroll.response.fields=doc_created
query.additional.cache.response.fields=doc_created
query.additional.highlighted.fields=
query.additional.search.fields=
query.additional.facet.fields=
query.additional.sort.fields=doc_created
query.additional.analyzed.fields=
query.additional.not.analyzed.fields=
…
Restartet OpenSearch, still, field created is used for sorting, not my field doc_created
field created is used for sorting, not my field doc_created
What does it mean? I couldn’t understand your problem. Could you provide more details?
I managed to read the real publishing date of documents and put it in a new created field “doc_created”.
I managed to show this field value in the search results in the lower left corner of each result.
But when I SORT the results BY DATE, the results are not sorted by my new field “doc_created”, but by some other field, I think by the field “created”.
Did you pass the field name to the sort request parameter? You need to modify the JSP file to use it.
Wow, it’s working!!! Thank you very much!!!
I had to change the value of <la:option value="created.asc">
in searchOptions.jsp,
and also ad something to <li class="list-inline-item">
list in search.jsp
I just need to recrawl all documents, so that my “doc_created” is fillid in everywhere. And this seems to happen now, when I disable the → Check “Last Modified”-Header Checkbox as you mentioned somewhere else. Thank you!!!
One more thing to fix.
In the right column of the search results, there is a box where you can select a date area, you can select:
last 24 hours
last week
last month
last year
How can I change that to the use of my field doc_created?
That’s the jsp code in searchResults.jsp :
<c:forEach var="fieldData" items="${facetResponse.fieldList}">
<c:if
test="${fieldData.name == 'label' && fieldData.valueCountMap.size() > 0}">
<ul class="list-group mb-2">
<li class="list-group-item text-uppercase"><la:message
key="labels.facet_label_title" /></li>
<c:forEach var="countEntry" items="${fieldData.valueCountMap}">
<c:if
test="${countEntry.value != 0 && fe:labelexists(countEntry.key)}">
<li class="list-group-item"><la:link
href="/search?q=${f:u(q)}&ex_q=label%3a${f:u(countEntry.key)}&sdh=${f:u(fe:sdh(sh))}${fe:pagingQuery(null)}${fe:facetQuery()}${fe:geoQuery()}">
${f:h(fe:label(countEntry.key))}
<span class="badge badge-secondary badge-pill float-right">${f:h(countEntry.value)}</span>
</la:link></li>
</c:if>
</c:forEach>
</ul>
</c:if>
</c:forEach>
The settings for facet queries are in fess_config.properties.
I already put
query.additional.facet.fields=doc_created
but this alone did not help.
The type of the ‘doc_created’ field is ‘date,’ and it is not a facet field. You need to use it in a facet query. If you need further assistance, please contact commercial support.
Let’s skip this.
Another problem: If a domain contains a wrong formated date in the meta data I want to read, the crawler stops working. How can I avoid this?