Add Open Graph Meta for crawling and JSON results

(from github.com/robinComa)
Hello,

I try to add Open Graph Meta inside my JSON results, my goal is to build a custom page based on the FESS JSON search to display the results with these fields by results :

  • [x] Title (based on <meta property=“og:title”)
  • [x] Description (based on <meta property=“og:description”)
  • [x] Thumbnail (based on <meta property=“og:image”)
  • [x] Url (based on <meta property=“og:url”)

I try to add it on doc.json like this :

"og:title": { "type": "keyword" }, "og:description": { "type": "keyword" }, "og:image": { "type": "keyword" }, "og:url": { "type": "keyword" }

And add it inside fess_config.properties :

query.additional.response.fields=og:title,og:description,og:image,og:url query.additional.search.fields=og:title,og:description,og:image,og:url

On the extractor side :
<!-- Open Graph --> <postConstruct name="addFieldRule"> <arg>"og:title"</arg> <arg>"//META[@name='og:title']/@content"</arg> </postConstruct> <postConstruct name="addFieldRule"> <arg>"og:description"</arg> <arg>"//META[@name='og:description']/@content"</arg> </postConstruct> <postConstruct name="addFieldRule"> <arg>"og:image"</arg> <arg>"//META[@name='og:image']/@content"</arg> </postConstruct> <postConstruct name="addFieldRule"> <arg>"og:url"</arg> <arg>"//META[@name='og:url']/@content"</arg> </postConstruct>

I restart the web crawling, but my json response is steel the same http://localhost:8080/json?q=flying :

[ { "filetype": "...", "title": "...", "content_title": "...", "digest": "...", "host": "...", "content_length": "351497", "timestamp": "2018-05-02T08:51:08.722Z", "url_link": "...", "created": "2018-05-02T08:51:08.722Z", "site_path": "...", "doc_id": "cbbba5547af04318aadec0d2de49e2a5", "url": "https://...", "content_description": "...", "site": "...", "filename": "...", "boost": "1.0", "mimetype": "text/html" } ]
No attribute og:*

Any idea? Thanks by advance!

(from github.com/iny)
Hello there,

Could you try it this way?

<postConstruct name="addFieldRule">
    <arg>"og_title"</arg>
    <arg>"//META[@ property =\"og:title\"]/@content"</arg>
 </postConstruct>

query.additional.api.response.fields=og_title

You can edit other “og” tags like this

(from github.com/robinComa)
Hello @iny thanks for your replly!

2 points :

  • My bad, I write @name instead of property
  • I don’t know the query.additional.api.response.fields behaviour

Now it is working perfectly!

Thanks you so much!

(from github.com/marevol)
To add a custom field(ex. og_title), steps is:

  1. Add a field name to query.additional.response.fields in fess_config.properites
    query.additional.api.response.fields is for API response, such as JSON.
  2. Start Fess
  3. Add a field type to doc.json
    ex. {"og_title": { "type": "keyword" }}
  4. Start reindexing with Update Aliases
  5. Create Crawling Config with Config Parameters
    ex. field.xpath.og_title=//META[@property='og:title']/@content
  6. Start Crawler
  7. Edit JSP files(ex. searchResult.jsp)
1 Like

(from github.com/Anders-Bergqvist)
Hi!
We have this meta on some of our pages wich we want to make a search facet (or filter) of:
<meta name="Occupation" content="Researcher" />

We have put this in fess_config.properties:
query.additional.response.fields=Occupation
query.additional.api.response.fields=Occupation

And this in doc.json:
"Occupation": { "type": "keyword" }

In the “Config Parameters” in the webcrawler we put:
field.xpath.Occupation=//META[@name='Occupation']/@content

What are we doing wrong? We want to make a search facet with api that says for example "Give me all the pages with the searchword “Charles” that contains <meta name="Occupation" content="Researcher" />

Please advice?

(from github.com/marevol)
Did you check if the value is stored to fess index?

(from github.com/Anders-Bergqvist)
Is that under “Crawling info”?
How do i query <meta name="Occupation" content="Researcher" />

(from github.com/marevol)
See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html

(from github.com/Anders-Bergqvist)
Now I have Occupation: "Researcher" in the Json answer.

Now I want to search for the word cancer in the label oru_se_personal and ONLY show hits with Occupation: Researcher.

I’ve tried this but It does not work.

http://xxx.xxx.xx.xxx:8080/json/?q=cancer&ex_q=label:oru_se_personal_sv&filter:Occupation=Researcher&num=10&start=0&lang=sv&sort=score.desc

I guess the &filter:Occupation=Researcher is wrong. What is the correct argument?

(from github.com/marevol)

filter:Occupation=Researcher

is a invalid request parameter.
Although I’m not sure about your index, it may be ex_q=label:oru_se_personal+Occupation:Researcher.