PDF metadata from linking (parent) HTML doc?

discuss · September 21, 2018, 10:02pm

(from github.com/manfred-w)
We are crawling a website that has html and pdf files (The pdf files are linked by <a href="...">
Is it possible to take some metadata from the html file that is linking the pdf file and store them with the pdf record?

Example:

index.html with meta keywords: test, news, other has a <a href="testdoc.pdf">PDF</a>
the testdoc.pdf file has no keywords
i would like to show the “test, news, others” keywords when the pdf file is found.

Is it possible to realize such a scenario?

Thanks a lot
Manfred

discuss · September 22, 2018, 6:07am

(from github.com/marevol)
No.
It’s better to put metadata into PDF.