Skip to content

Searchability problem #27

@aditya-shrivastavv

Description

@aditya-shrivastavv

I think I solves the problem but simultaneously creates one too. (correct me if I am wrong) This approach converts PDF pages to images and sends it to DLP API, Then DLP does its work and returns back the redacted images. Then we combine those images again to PDF. Right??

But the PDF is no longer searchable. It results in a loss of the original text data. The resulting PDF will not be readable by ATS, as ATS systems typically require text to be present, not images. Conversion to other formats like Word or plain text will not work as expected, as the text content is no longer available in its original form.

Am I right?

If this is a legit problem, I think I have a solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions