(from t_hayakawa's Profile - OSDN)
[メッセージ #79407 への返信]
ファイル名のエンコーディングがUTF-8ではないため、
REPLACEMENT CHARACTERに置き換わり、%EF%BF%BDに
なっているものと思われます。クローラジョブの呼び出しに
.jvmOptions(“-Dfile.encoding=MS932”) とか追加すれば
いけるのかもしれませんが、smbクロールにするか、
ファイル名がUTF-8になるようにマウントするとかのほうが
良いかもしれません。
エンコードの問題であるかどうかを確認するため、下記のようにエンコードがUTF-8となっていることを確認した後、再度クロールを行いました。
$ nkf -g ハロー.txt
UTF-8
しかし、結果に変化はありませんでした。
(最下部に結果を記載させていただきます。)
補足となりますが、Windows環境(別PC)で試したところ、日本語ファイル名をクロールして
正しくエンコード出来ていることを確認しました。
UTF-8とした上で正常に取得できないことから、何か不具合を抱えているのでしょうか。
他に確認する項目がございましたらご教授のほど、何卒よろしくお願いいたします。
2017-02-20 19:42:54,677 [main] INFO …Loading specified properties and get by the key: fess_env_crawler.properties, lasta_di.smart.deploy.mode
2017-02-20 19:42:54,681 [main] INFO …Setting smart deploy mode: warm
2017-02-20 19:42:54,693 [main] INFO …Reading lasta_di.xml
2017-02-20 19:42:54,742 [main] INFO …Reading redefiner.xml
2017-02-20 19:42:54,794 [main] INFO …Reading smartdeploy.xml
2017-02-20 19:42:55,043 [main] INFO …Reading smart/warmdeploy.xml
2017-02-20 19:42:55,047 [main] INFO …Reading convention.xml
2017-02-20 19:42:55,051 [main] INFO …Reading embedded_convention.xml
2017-02-20 19:42:55,061 [main] INFO …Reading creator.xml
2017-02-20 19:42:55,064 [main] INFO …Reading convention.xml (recycle)
2017-02-20 19:42:55,065 [main] INFO …Reading customizer.xml
2017-02-20 19:42:55,068 [main] INFO …Reading lastafw_customizer.xml
2017-02-20 19:42:55,078 [main] INFO …Reading embedded_customizer.xml
2017-02-20 19:42:55,081 [main] INFO …Reading tx_customizer.xml
2017-02-20 19:42:55,101 [main] INFO …Reading my_creator.xml
2017-02-20 19:42:55,104 [main] INFO …Reading lastafw_creator.xml
2017-02-20 19:42:55,107 [main] INFO …Reading convention.xml (recycle)
2017-02-20 19:42:55,107 [main] INFO …Reading customizer.xml (recycle)
2017-02-20 19:42:55,115 [main] INFO …Reading embedded_creator.xml
2017-02-20 19:42:55,119 [main] INFO …Reading convention.xml (recycle)
2017-02-20 19:42:55,119 [main] INFO …Reading customizer.xml (recycle)
2017-02-20 19:42:55,124 [main] INFO …Reading customizer.xml (recycle)
2017-02-20 19:42:55,150 [main] INFO …Reading app.xml
2017-02-20 19:42:55,154 [main] INFO …Reading convention.xml
2017-02-20 19:42:55,158 [main] INFO …Reading embedded_convention.xml
2017-02-20 19:42:55,162 [main] INFO …Reading lastaflute_core.xml
2017-02-20 19:42:55,165 [main] INFO …Reading lastaflute_assist.xml
2017-02-20 19:42:55,168 [main] INFO …Reading lastaflute_director.xml
2017-02-20 19:42:55,245 [main] INFO …Reading fess.xml
2017-02-20 19:42:55,248 [main] INFO …Reading fess_config.xml
2017-02-20 19:42:55,256 [main] INFO …Reading fess_ds.xml
2017-02-20 19:42:55,275 [main] INFO …Reading esflute_config.xml
2017-02-20 19:42:55,277 [main] INFO …Reading esclient.xml
2017-02-20 19:42:55,451 [main] INFO …Reading esflute_user.xml
2017-02-20 19:42:55,454 [main] INFO …Reading esclient.xml (recycle)
2017-02-20 19:42:55,468 [main] INFO …Reading esflute_log.xml
2017-02-20 19:42:55,471 [main] INFO …Reading esclient.xml (recycle)
2017-02-20 19:42:55,518 [main] INFO …Reading crawler_es.xml
2017-02-20 19:42:55,521 [main] INFO …Reading crawler/container.xml
2017-02-20 19:42:55,525 [main] INFO …Reading crawler/client.xml
2017-02-20 19:42:55,528 [main] INFO …Reading crawler/container.xml (recycle)
2017-02-20 19:42:55,528 [main] INFO …Reading crawler/robotstxt.xml
2017-02-20 19:42:55,530 [main] INFO …Reading crawler/container.xml (recycle)
2017-02-20 19:42:55,534 [main] INFO …Reading crawler/contentlength.xml
2017-02-20 19:42:55,536 [main] INFO …Reading crawler/container.xml (recycle)
2017-02-20 19:42:55,538 [main] INFO …Reading crawler/mimetype.xml
2017-02-20 19:42:55,541 [main] INFO …Reading crawler/container.xml (recycle)
2017-02-20 19:42:55,620 [main] INFO …Reading crawler/rule.xml
2017-02-20 19:42:55,623 [main] INFO …Reading crawler/container.xml (recycle)
2017-02-20 19:42:55,623 [main] INFO …Reading crawler/transformer.xml
2017-02-20 19:42:55,626 [main] INFO …Reading crawler/transformer_basic.xml
2017-02-20 19:42:55,629 [main] INFO …Reading crawler/container.xml (recycle)
2017-02-20 19:42:55,664 [main] INFO …Reading crawler/filter.xml
2017-02-20 19:42:55,666 [main] INFO …Reading crawler/container.xml (recycle)
2017-02-20 19:42:55,669 [main] INFO …Reading crawler/interval.xml
2017-02-20 19:42:55,671 [main] INFO …Reading crawler/container.xml (recycle)
2017-02-20 19:42:55,674 [main] INFO …Reading crawler/extractor.xml
2017-02-20 19:42:55,676 [main] INFO …Reading crawler/container.xml (recycle)
2017-02-20 19:42:55,684 [main] INFO …Reading crawler/extractor+tikaExtractor.xml
2017-02-20 19:42:55,687 [main] INFO …Reading crawler/container.xml
2017-02-20 19:42:55,713 [main] INFO …Reading crawler/mimetype.xml (recycle)
2017-02-20 19:42:55,713 [main] INFO …Reading crawler/encoding.xml
2017-02-20 19:42:55,716 [main] INFO …Reading crawler/container.xml (recycle)
2017-02-20 19:42:55,717 [main] INFO …Reading crawler/urlconverter.xml
2017-02-20 19:42:55,720 [main] INFO …Reading crawler/container.xml (recycle)
2017-02-20 19:42:55,722 [main] INFO …Reading crawler/log.xml
2017-02-20 19:42:55,724 [main] INFO …Reading crawler/container.xml (recycle)
2017-02-20 19:42:55,727 [main] INFO …Reading crawler/sitemaps.xml
2017-02-20 19:42:55,729 [main] INFO …Reading crawler/container.xml (recycle)
2017-02-20 19:42:55,732 [main] INFO …Reading crawler/es.xml
2017-02-20 19:42:55,745 [main] INFO …Reading crawler_es+crawlerThread.xml
2017-02-20 19:42:55,763 [main] INFO …Reading crawler_thumbnail.xml
2017-02-20 19:42:55,877 [main] INFO [Objective Config]
2017-02-20 19:42:55,877 [main] INFO fess_config.properties extends [fess_env_crawler.properties]
2017-02-20 19:42:55,877 [main] INFO checkImplicitOverride=true, propertyCount=334
2017-02-20 19:42:55,912 [main] INFO [Exception Translator]
2017-02-20 19:42:55,913 [main] INFO exceptionTranslationProvider: null
2017-02-20 19:42:55,914 [main] INFO [Async Manager]
2017-02-20 19:42:55,914 [main] INFO defaultConcurrentAsyncOption: {no option}
2017-02-20 19:42:55,914 [main] INFO primaryExecutorService: ThreadPoolExecutor@4275c20c
2017-02-20 19:42:55,914 [main] INFO secondaryExecutorService: ThreadPoolExecutor@7c56e013
2017-02-20 19:42:55,915 [main] INFO [Primary Cipher]
2017-02-20 19:42:55,915 [main] INFO invertibleCryptographer: {AES, UTF-8}
2017-02-20 19:42:55,915 [main] INFO oneWayCryptographer: {SHA-256, UTF-8}
2017-02-20 19:42:55,916 [main] INFO [Time Manager]
2017-02-20 19:42:55,916 [main] INFO businessTimeHandler: TypicalBusinessTimeHandler:{TypicalTimeResourceProvider$$Lambda$27/385332399, TypicalTimeResourceProvider$$Lambda$28/1942356772}
2017-02-20 19:42:55,916 [main] INFO currentIgnoreTransaction: false
2017-02-20 19:42:55,916 [main] INFO adjustTimeMillis: 0
2017-02-20 19:42:55,916 [main] INFO adjustAbsoluteMode: false
2017-02-20 19:42:55,951 [main] INFO [JSON Manager]
2017-02-20 19:42:55,951 [main] INFO realJsonParser: GsonJsonParser
2017-02-20 19:42:55,952 [main] INFO [Postbox]
2017-02-20 19:42:55,952 [main] INFO postOffice: PostOffice@5d5d9e5
2017-02-20 19:42:55,953 [main] INFO postalParkingLot: SMailPostalParkingLot:{{category:{main}=motorbike:{session=javax.mail.Session@303e3593}}}@4ef27d66
2017-02-20 19:42:55,953 [main] INFO postalPersonnel: FessMailDeliveryDepartmentCreator$1:{SMailConventionReceptionist:{}@362a019c, batch:{[proofreader:{pmcomment}]}}@1d9bec4d
2017-02-20 19:42:56,084 [main] INFO [Legion] modules , plugins , sites
2017-02-20 19:42:58,202 [main] INFO Lasta Di boot successfully.
2017-02-20 19:42:58,205 [main] INFO SmartDeploy Mode: Warm Deploy
2017-02-20 19:42:58,205 [main] INFO Smart Package: org.codelibs.fess.app
2017-02-20 19:42:58,362 [main] INFO Starting Crawler…
2017-02-20 19:42:58,419 [WebFsCrawler] INFO [Micro] modules , plugins , sites
2017-02-20 19:42:58,495 [WebFsCrawler] INFO Connected to localhost:9301
2017-02-20 19:42:58,585 [WebFsCrawler] INFO Target Path: file:/home/ad/MyDocument/test/
2017-02-20 19:42:58,668 [Crawler-AVpbHYx0l28rkACeskQ4-1-2] INFO Crawling URL: file:/home/ad/MyDocument/test/
2017-02-20 19:43:00,046 [Crawler-AVpbHYx0l28rkACeskQ4-1-2] INFO Crawling URL: file:/home/ad/MyDocument/test/%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD.txt~
2017-02-20 19:43:00,162 [Crawler-AVpbHYx0l28rkACeskQ4-1-5] INFO Crawling URL: file:/home/ad/MyDocument/test/%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD~
2017-02-20 19:43:00,167 [Crawler-AVpbHYx0l28rkACeskQ4-1-4] INFO Crawling URL: file:/home/ad/MyDocument/test/hal.txt~
2017-02-20 19:43:00,168 [Crawler-AVpbHYx0l28rkACeskQ4-1-3] INFO Crawling URL: file:/home/ad/MyDocument/test/hello.txt~
2017-02-20 19:43:00,169 [Crawler-AVpbHYx0l28rkACeskQ4-1-1] INFO Crawling URL: file:/home/ad/MyDocument/test/%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD.txt
2017-02-20 19:43:01,060 [Crawler-AVpbHYx0l28rkACeskQ4-1-2] INFO Crawling URL: file:/home/ad/MyDocument/test/hello.txt
2017-02-20 19:43:01,173 [Crawler-AVpbHYx0l28rkACeskQ4-1-5] INFO Crawling URL: file:/home/ad/MyDocument/test/test.txt~
2017-02-20 19:43:08,606 [IndexUpdater] INFO Processing 4/4 docs (Doc:{access 5ms}, Mem:{used 42MB, heap 119MB, max 494MB})
2017-02-20 19:43:08,762 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 109ms}, Mem:{used 44MB, heap 119MB, max 494MB})
2017-02-20 19:43:08,895 [IndexUpdater] INFO Sent 4 docs (Doc:{process 45ms, send 133ms, size 6bytes}, Mem:{used 48MB, heap 119MB, max 494MB})
2017-02-20 19:43:18,597 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 109ms}, Mem:{used 48MB, heap 119MB, max 494MB})
2017-02-20 19:43:28,599 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 109ms}, Mem:{used 48MB, heap 119MB, max 494MB})
2017-02-20 19:43:32,303 [WebFsCrawler] INFO [EXEC TIME] crawling time: 33899ms
2017-02-20 19:43:38,598 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 109ms}, Mem:{used 48MB, heap 119MB, max 494MB})
2017-02-20 19:43:38,598 [IndexUpdater] INFO [EXEC TIME] index update time: 307ms
2017-02-20 19:43:38,686 [main] INFO Finished Crawler
2017-02-20 19:43:38,806 [main] INFO [CRAWL INFO] CrawlerEndTime=2017-02-20T19:43:38.687+0900,WebFsCrawlExecTime=33899,CrawlerStatus=true,CrawlerStartTime=2017-02-20T19:42:58.362+0900,WebFsCrawlEndTime=2017-02-20T19:43:38.686+0900,WebFsIndexExecTime=307,WebFsIndexSize=4,CrawlerExecTime=40325,WebFsCrawlStartTime=2017-02-20T19:42:58.390+0900
2017-02-20 19:43:38,856 [main] INFO Disconnected to elasticsearch:localhost:9301
2017-02-20 19:43:45,954 [main] INFO Destroyed LaContainer.