Presidio falsely detects "JavaScript" and tech terms as <IN_PAN> when analyzing longer texts #1679
Replies: 1 comment
-
Hi @nurullah7733, see the answer here: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi team,
I'm running into an odd behavior with mcr.microsoft.com/presidio-analyzer (running via Docker Compose). In my Express.js application, I'm analyzing user-generated resume-style text using the analyzer API.
I've noticed that when the input contains tech terms like "JavaScript" they are often being falsely detected as the IN_PAN entity type (Indian Permanent Account Number) — but only when the input text is long or paragraph-style.
🧪 Examples:
Input 1 (short sentence):
I love working with JavaScript.
Output: ✅ No PII detected. All good.
Input 2 (longer real-world text):
I worked in web design and wordpress development on Fiverr.
It was like 8 to 9 months. After that I shifted to Javascript.
Now I am working with Node JS and learning more...
Output: ❌ Javascript and similar terms get marked as IN_PAN. like this: I worked in web design and wordpress development on Fiverr. It was like <DATE_TIME>. After that I shift to <IN_PAN>.
Here i use in express.js docker-compose.yml file:
`const axios = require("axios");
exports.analyzerAndAnonymizeController = async (req, res) => {
try {
const { text } = req.body;
} catch (error) {
console.error("analyze and anonymize error:", error.message);
return res.status(400).json({ status: "fail", data: error.message });
}
};
`
Beta Was this translation helpful? Give feedback.
All reactions