crawling data

(from github.com/cross1154)

(from github.com/cross1154)
Version:
FESS: 13.4.0
Elasticsearch cluster: 7.4.0

I crawling some documents composed with japanese language.But when I search it from elasticsearch, I find [“lang”: “en”] contains in the crawling data:

{ "_index": "fess.20191217", "_type": "_doc", "_id": "9c9f8a455f69159308ea1e24ed2fabd3c3f377be7cc19c0d064a1a7eb238253d69f0bb948d09ee3ff0298e360a298640c6b2b2e9c5c51941310588a147fb61a7", "_score": 1, "_source": { "filetype": "excel", "expires": "2019-12-26T11:36:59.707Z", "role": [ "1500", "2500", "Rguest" ], "click_count": 0, "title": "テキストボックスクローリング.xlsx", "content": "クローリングドキュメント テキストボックスの文字はクローリングできるか?横書きテキストを試す。 テキストボックスの文字はクローリングできるか?縦書きテキストを試す。 2019-12-23T10:17:43Z 15.0300 user01 2019-12-09T00:32:30Z 2019-12-23T10:17:43Z 2019-12-23T10:17:43Z 2019-12-23T10:17:43Z false 2019-12-23T10:17:43Z Microsoft Excel 2019-12-23T10:17:43Z user01 user01 Microsoft Excel 2019-12-09T00:32:30Z user01 2019-12-09T00:32:30Z user01 D:\transport\ 15.0300 user01", "segment": "Yacq9G4Bjaee_QHS7v4f", "digest": "クローリングドキュメント テキストボックスの文字はクローリングできるか?横書きテキストを試す。 テキストボックスの文字はクローリングできるか?縦書きテキストを試す。 2019-12-23T10:17:43Z 15.0300 user01 2019-12-09T00:32:30Z 2019-12-23T10:17:43Z 2019-12-23T10:17:43Z 2019-12-23T10:1...", "host": "localhost", "favorite_count": 0, "lang": "en", "last_modified": "2019-12-23T10:17:44.000Z", "content_length": "9535", "timestamp": "2019-12-23T10:17:44.000Z", "virtual_host": [], "thumbnail": "file:/var/fess/devnet_file/20191223/%E3%83%86%E3%82%AD%E3%82%B9%E3%83%88%E3%83%9C%E3%83%83%E3%82%AF%E3%82%B9%E3%82%AF%E3%83%AD%E3%83%BC%E3%83%AA%E3%83%B3%E3%82%B0.xlsx", "created": "2019-12-23T11:37:04.413Z", "label": [], "doc_id": "456fe55be1e54b5e8979d21082e7994e", "url": "file:/var/fess/devnet_file/20191223/%E3%83%86%E3%82%AD%E3%82%B9%E3%83%88%E3%83%9C%E3%83%83%E3%82%AF%E3%82%B9%E3%82%AF%E3%83%AD%E3%83%BC%E3%83%AA%E3%83%B3%E3%82%B0.xlsx", "site": "/var/fess/devnet_file/20191223/テキストボックスクローリング.xlsx", "filename": "テキストボックスクローリング.xlsx", "config_id": "FYacq9G4Bjaee_QHS7v4f", "parent_id": "8924bbd14cc67b1dddfda36e1e8367c45c005afdf0280bd2e12a457b86c167cf4dfd66112eb7d5568d1adc8478e78721ab22784a29440b59f25701bf3e402d13", "anchor": "", "boost": "1.0", "mimetype": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" } }

I can search it with “content_en”. but I can’t find it with “content_ja”
"query_string": { "default_field": "content_en", "query": "テキスト" }
Is there any setting I missed?

(from github.com/marevol)
If you want to set a language instead of auto detection, you can set ja in fess_config.properties.

crawler.document.html.default.lang=
crawler.document.file.default.lang=

(from github.com/cross1154)
Thank you.