PDF metadata from linking (parent) HTML doc?

(from github.com/manfred-w)
We are crawling a website that has html and pdf files (The pdf files are linked by <a href="...">
Is it possible to take some metadata from the html file that is linking the pdf file and store them with the pdf record?


  1. index.html with meta keywords: test, news, other has a <a href="testdoc.pdf">PDF</a>
  2. the testdoc.pdf file has no keywords
  3. i would like to show the “test, news, others” keywords when the pdf file is found.

Is it possible to realize such a scenario?

Thanks a lot

(from github.com/marevol)
It’s better to put metadata into PDF.