java - how to extract PDF watermark content using iText apis -

- April 15, 2011

i going through itext api docs & able create pdf watermark image or text did not find method get/extract watermark content pdf.

so have pdf document containing watermarked text/image & want extract text or img , validate not able do.

how extract watermark content using itext apis? or there other way validate watermark content?

by validate mean if have existing pdf/image watermarked text [as done in 2nd link in above ref], want check whether has expected text/image.

references:

how extract watermark content using itext apis? or there other way validate watermark content?

extracting watermark content?

there nothing special watermarks in pdfs in contrast regular page content. merely

appear pretty in content stream , other content later in stream, therefore, drawn above it; or they
appear pretty late in content stream have kind of transparency applied.

actually there type of watermarks is special, so-called watermark annotations. as these annotation can lost when documents merged or otherwise manipulated, though, hardly ever used.

furthermore different pdf generating software suites offering way add watermarks in respective individual way. thus, cannot recognize watermarks special operations done in specific unique pattern.

already itext examples referred apply different kinds of watermarks

moviecountries2 draws gray large text using angled base line.
stampstationery copies complete page pdf (which may visually have foreground , background material) separate object inside target pdf , adds reference object @ beginning of every page of target.
insertpages references page pdf on every newly generated target document page.

thus, blind watermark extraction virtually impossible.

validating watermark content!

you might try validation, though, if know searching for. not merely search (in pdf not existing) fixed watermark stream instead whole page content.

itext offers classes of parser package allow extraction of text and/or bitmap images content streams. @ samples referenced keywords parsing pdf > extracting images , parsing pdf > extracting text.

you merely have check whether image or text expect can found these classes positioned , styled expect.

Search This Blog

EIght