Not all links in a page is crawled and indexed

(from riceming (Ming Chan) · GitHub)
I have a site page
SUCCESS UNIVERSE GROUP LIMITED - Announcements

In the page there is a list of pdf, but only some of them are crawled and indexed. What is the problem?

fess-crawler.log

2017-07-11 12:13:54,443 [WebFsCrawler] INFO Target URL: SUCCESS UNIVERSE GROUP LIMITED - Announcements
2017-07-11 12:13:54,443 [WebFsCrawler] INFO Included URL: SUCCESS UNIVERSE GROUP LIMITED - Announcements
2017-07-11 12:13:54,443 [WebFsCrawler] INFO Included URL: irasia.com - SUCCESS UNIVERSE GROUP LIMITED*
2017-07-11 12:13:54,444 [WebFsCrawler] INFO Included URL: http://203.194.162.10/listco/hk/successug/.*
2017-07-11 12:13:54,444 [WebFsCrawler] INFO Included URL: http://file.irasia.com/listco/hk/successug/.*
2017-07-11 12:13:54,444 [WebFsCrawler] INFO Included URL: http://202.66.146.82/listco/hk/successug/.*
2017-07-11 12:13:54,444 [WebFsCrawler] INFO Included URL: http://doc.irasia.com/irasiafile/pdf/listco/hk/successug/.*
2017-07-11 12:13:54,444 [WebFsCrawler] INFO Included URL: http://47.52.45.56/irasiafile/pdf/listco/hk/successug/.*
2017-07-11 12:13:54,444 [WebFsCrawler] INFO Excluded URL: .*print=Y
2017-07-11 12:13:54,444 [WebFsCrawler] INFO Excluded URL: .*mp3
2017-07-11 12:13:54,444 [WebFsCrawler] INFO Excluded URL: .*jpg
2017-07-11 12:13:54,444 [WebFsCrawler] INFO Excluded URL: .*gif
2017-07-11 12:13:54,444 [WebFsCrawler] INFO Excluded URL: .*png
2017-07-11 12:13:54,445 [WebFsCrawler] INFO Excluded URL: .*mp4
2017-07-11 12:13:54,626 [Crawler-20170711121345-1-1] INFO Crawling URL: SUCCESS UNIVERSE GROUP LIMITED - Announcements
2017-07-11 12:13:54,689 [Crawler-20170711121345-1-1] INFO Checking URL: http://www.successug.com/robots.txt
2017-07-11 12:14:04,479 [IndexUpdater] INFO Processing 1/1 docs (Doc:{access 5ms}, Mem:{used 99MB, heap 151MB, max 495MB})
2017-07-11 12:14:04,541 [IndexUpdater] INFO Processing no docs (Doc:{access 3ms, cleanup 22ms}, Mem:{used 101MB, heap 151MB, max 495MB})
2017-07-11 12:14:04,662 [IndexUpdater] INFO Sent 1 docs (Doc:{process 36ms, send 121ms, size 12KB}, Mem:{used 105MB, heap 151MB, max 495MB})
2017-07-11 12:14:05,158 [Crawler-20170711121345-1-4] INFO Crawling URL: http://www.irasia.com/listco/hk/successug/announcement/a170707.pdf
2017-07-11 12:14:05,158 [Crawler-20170711121345-1-4] INFO Checking URL: http://www.irasia.com/robots.txt
2017-07-11 12:14:05,158 [Crawler-20170711121345-1-2] INFO Crawling URL: http://file.irasia.com/listco/hk/successug/annual/2016/agm.pdf
2017-07-11 12:14:05,159 [Crawler-20170711121345-1-2] INFO Checking URL: http://file.irasia.com/robots.txt
2017-07-11 12:14:05,159 [Crawler-20170711121345-1-3] INFO Crawling URL: http://file.irasia.com/listco/hk/successug/announcement/a170619.pdf
2017-07-11 12:14:05,159 [Crawler-20170711121345-1-5] INFO Crawling URL: http://www.irasia.com/listco/hk/successug/announcement/a171837-ew_00487ann_20032017.pdf
2017-07-11 12:14:05,783 [Thread-3] WARN Building on-disk font cache, this may take a while
2017-07-11 12:14:05,784 [Thread-3] WARN Finished building on-disk font cache, found 0 fonts
2017-07-11 12:14:05,784 [Thread-3] WARN Using fallback font ‘LiberationSans’ for ‘Times New Roman,Italic’
2017-07-11 12:14:05,822 [Thread-3] WARN Using fallback font ‘LiberationSans’ for ‘Times New Roman’
2017-07-11 12:14:05,826 [Thread-3] WARN Using fallback font ‘LiberationSans’ for ‘Times New Roman,Bold’
2017-07-11 12:14:05,856 [Thread-3] INFO OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
2017-07-11 12:14:14,473 [IndexUpdater] INFO Processing 4/4 docs (Doc:{access 4ms, cleanup 22ms}, Mem:{used 119MB, heap 189MB, max 495MB})
2017-07-11 12:14:14,533 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 20ms}, Mem:{used 120MB, heap 189MB, max 495MB})
2017-07-11 12:14:14,578 [IndexUpdater] INFO Sent 4 docs (Doc:{process 38ms, send 44ms, size 328KB}, Mem:{used 120MB, heap 189MB, max 495MB})
2017-07-11 12:14:16,478 [Crawler-20170711121345-1-3] INFO Crawling URL: http://www.irasia.com/listco/hk/successug/announcement/a175963-ew_00487ann_votingresults_2017.pdf
2017-07-11 12:14:16,546 [Thread-7] WARN Using fallback font LiberationSans for base font Times-Roman
2017-07-11 12:14:16,547 [Thread-7] WARN Using fallback font LiberationSans for base font Times-Bold
2017-07-11 12:14:16,548 [Thread-7] WARN Using fallback font LiberationSans for base font Times-Italic
2017-07-11 12:14:16,548 [Thread-7] WARN Using fallback font LiberationSans for base font Times-BoldItalic
2017-07-11 12:14:16,548 [Thread-7] WARN Using fallback font LiberationSans for base font Helvetica
2017-07-11 12:14:16,548 [Thread-7] WARN Using fallback font LiberationSans for base font Helvetica-Bold
2017-07-11 12:14:16,549 [Thread-7] WARN Using fallback font LiberationSans for base font Helvetica-Oblique
2017-07-11 12:14:16,549 [Thread-7] WARN Using fallback font LiberationSans for base font Helvetica-BoldOblique
2017-07-11 12:14:16,549 [Thread-7] WARN Using fallback font LiberationSans for base font Courier
2017-07-11 12:14:16,549 [Thread-7] WARN Using fallback font LiberationSans for base font Courier-Bold
2017-07-11 12:14:16,550 [Thread-7] WARN Using fallback font LiberationSans for base font Courier-Oblique
2017-07-11 12:14:16,550 [Thread-7] WARN Using fallback font LiberationSans for base font Courier-BoldOblique
2017-07-11 12:14:16,551 [Thread-7] WARN Using fallback font LiberationSans for base font Symbol
2017-07-11 12:14:16,552 [Thread-7] WARN Using fallback font LiberationSans for base font ZapfDingbats
2017-07-11 12:14:16,552 [Thread-7] WARN Using fallback font LiberationSans for Times-Roman
2017-07-11 12:14:16,554 [Thread-7] WARN Using fallback font LiberationSans for Times-Italic
2017-07-11 12:14:16,557 [Thread-7] WARN Using fallback font LiberationSans for Times-Bold
2017-07-11 12:14:24,473 [IndexUpdater] INFO Processing 1/1 docs (Doc:{access 4ms, cleanup 20ms}, Mem:{used 117MB, heap 189MB, max 495MB})
2017-07-11 12:14:24,495 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 117MB, heap 189MB, max 495MB})
2017-07-11 12:14:24,516 [IndexUpdater] INFO Sent 1 docs (Doc:{process 7ms, send 21ms, size 106KB}, Mem:{used 117MB, heap 189MB, max 495MB})
2017-07-11 12:14:34,471 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 117MB, heap 189MB, max 495MB})
2017-07-11 12:14:44,472 [IndexUpdater] INFO Processing no docs (Doc:{access 3ms, cleanup 13ms}, Mem:{used 118MB, heap 189MB, max 495MB})
2017-07-11 12:14:54,472 [IndexUpdater] INFO Processing no docs (Doc:{access 3ms, cleanup 13ms}, Mem:{used 118MB, heap 189MB, max 495MB})
2017-07-11 12:15:04,471 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 118MB, heap 189MB, max 495MB})
2017-07-11 12:15:14,471 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 118MB, heap 189MB, max 495MB})
2017-07-11 12:15:24,471 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 118MB, heap 189MB, max 495MB})
2017-07-11 12:15:34,472 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 118MB, heap 189MB, max 495MB})
2017-07-11 12:15:44,473 [IndexUpdater] INFO Processing no docs (Doc:{access 4ms, cleanup 13ms}, Mem:{used 118MB, heap 189MB, max 495MB})
2017-07-11 12:15:54,472 [IndexUpdater] INFO Processing no docs (Doc:{access 3ms, cleanup 13ms}, Mem:{used 118MB, heap 189MB, max 495MB})
2017-07-11 12:16:04,471 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 118MB, heap 189MB, max 495MB})
2017-07-11 12:16:14,471 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 119MB, heap 189MB, max 495MB})
2017-07-11 12:16:24,471 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 119MB, heap 189MB, max 495MB})
2017-07-11 12:16:34,471 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 119MB, heap 189MB, max 495MB})
2017-07-11 12:16:44,471 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 119MB, heap 189MB, max 495MB})
2017-07-11 12:16:54,471 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 119MB, heap 189MB, max 495MB})
2017-07-11 12:17:04,471 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 119MB, heap 189MB, max 495MB})
2017-07-11 12:17:14,473 [IndexUpdater] INFO Processing no docs (Doc:{access 3ms, cleanup 13ms}, Mem:{used 119MB, heap 189MB, max 495MB})
2017-07-11 12:17:24,471 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 119MB, heap 189MB, max 495MB})
2017-07-11 12:17:34,472 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 119MB, heap 189MB, max 495MB})
2017-07-11 12:17:44,473 [IndexUpdater] INFO Processing no docs (Doc:{access 4ms, cleanup 13ms}, Mem:{used 120MB, heap 189MB, max 495MB})
2017-07-11 12:17:54,471 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 120MB, heap 189MB, max 495MB})
2017-07-11 12:17:56,674 [WebFsCrawler] INFO [EXEC TIME] crawling time: 242399ms
2017-07-11 12:18:04,472 [IndexUpdater] INFO Processing no docs (Doc:{access 2ms, cleanup 13ms}, Mem:{used 120MB, heap 189MB, max 495MB})
2017-07-11 12:18:04,472 [IndexUpdater] INFO [EXEC TIME] index update time: 414ms
2017-07-11 12:18:04,514 [main] INFO Finished Crawler
2017-07-11 12:18:04,547 [main] INFO [CRAWL INFO] DataCrawlEndTime=2017-07-11T12:13:54.271+0800,CrawlerEndTime=2017-07-11T12:18:04.515+0800,WebFsCrawlExecTime=242399,CrawlerStatus=true,CrawlerStartTime=2017-07-11T12:13:54.207+0800,WebFsCrawlEndTime=2017-07-11T12:18:04.514+0800,WebFsIndexExecTime=414,WebFsIndexSize=6,CrawlerExecTime=250308,DataCrawlStartTime=2017-07-11T12:13:54.242+0800,WebFsCrawlStartTime=2017-07-11T12:13:54.241+0800
2017-07-11 12:18:09,574 [main] INFO Disconnected to elasticsearch:localhost:9300
2017-07-11 12:18:11,746 [main] INFO Destroyed LaContainer.

Url like http://file.irasia.com/listco/hk/successug/annual/2016/res.pdf in the page is not crawled and indexed.

in fess_config.properties
crawler.document.html.cannonical.xpath=
set to empty

(from github.com/marevol)
http://file.irasia.com/robots.txt disallows all files.
To disable robots.txt, change crawler.ignore.robots.txt in fess_config.properties.

crawler.ignore.robots.txt=true

(from github.com/riceming)
Thanks a lot, issue solved.