What PII Should Be Redacted?

HTMLVault Team·May 2, 2026·7 min read

Todd, the product manager, meant to share a harmless HTML preview with a vendor. Instead, he sent a page that included a customer email, an internal test phone number, and a support agent’s full name buried in a comment block. Nobody was trying to be reckless. That is exactly the problem. When teams ask what pii should be redacted, they are usually already one copied link away from an awkward meeting with security.

For teams sharing AI-generated output, sales collateral, support exports, or internal HTML artifacts, redaction is not a cosmetic step. It is a control. The question is not just whether data looks sensitive on the page. It is whether a person, regulator, customer, or search crawler could use that data to identify someone or connect it to something they should not see.

What PII should be redacted in practice

The short answer is this: redact any data element that can directly identify a person, and any data that can reasonably identify them when combined with other information. That includes obvious fields like full names, personal email addresses, phone numbers, home addresses, Social Security numbers, passport numbers, driver’s license numbers, and payment card data. It also includes less obvious items such as employee IDs, customer account numbers, device identifiers, IP addresses in certain contexts, health information, and date of birth when it is attached to a profile or record.

Where teams get into trouble is treating PII like a fixed checklist. It is not. The same field can be harmless in one context and high risk in another. A first name alone in a mockup might be fine. A first name next to a territory, company, personal email, and renewal date is no longer harmless. That turns into a profile.

For B2B teams, work email addresses deserve special attention. People sometimes assume business contact data is fair game because it is used for sales and operations. That assumption creates risk. A business email tied to support activity, contract details, health claims, compensation, or account issues can still be sensitive. Compliance teams usually care less about whether the data came from work or personal life and more about whether it identifies a person and exposes something they did not agree to share.

Direct identifiers vs. indirect identifiers

Direct identifiers are the easy part. If the field points straight to a specific person, redact it unless there is a clear business reason not to. That means names, email addresses, phone numbers, mailing addresses, government-issued IDs, biometric data, bank information, and card numbers should be removed or masked before sharing.

Indirect identifiers are where otherwise competent teams start behaving like a sitcom subplot. Todd removes the customer’s name, congratulates himself, and sends the file. The page still includes city, job title, exact signup date, support ticket ID, and a screenshot filename with the user’s email in it. Now anyone with basic context can infer who the customer is.

Indirect identifiers matter because re-identification is often simple. A job title plus company plus location may be enough. A client code plus invoice amount plus date range may be enough. Even internal usernames can be enough if your naming conventions are predictable. If the audience does not need the detail to do their job, redact it.

Fields that usually need redaction

Most teams should assume these categories require removal, masking, or tokenization before external sharing: full names, personal and business email addresses, phone numbers, street addresses, dates of birth, account and policy numbers, employee IDs, customer IDs, payment details, tax IDs, government IDs, health-related details, login credentials, API keys, and session tokens.

In HTML and AI-generated content, add another category: anything hidden in the source. Comments, metadata, alt text, JSON blobs, embedded prompts, sample payloads, prefilled forms, and query strings can all carry PII. Security incidents rarely announce themselves in bold font. They hide in the parts nobody reviewed because the visible page looked clean.

What PII should be redacted from HTML and AI output

This is where modern workflows create a special kind of mess. AI tools are fast, and fast tools encourage sharing before inspection. Teams generate HTML reports, landing pages, support summaries, or internal prototypes, then pass them around as links. If the output includes copied CRM records, chat transcripts, test credentials, or scraped customer examples, the page may contain both PII and secrets.

With HTML specifically, you need to inspect more than the rendered content. Look at page source, form values, comments, embedded scripts, image filenames, structured data, and analytics parameters. A support dashboard export may hide customer identifiers inside data attributes. A demo form may contain a real phone number from last week’s testing. A screenshot embedded in the page may include a mailbox in the corner. Redaction has to cover the full artifact, not just what fits above the fold.

AI output adds another twist. Models often preserve detail better than users expect. If someone pastes raw tickets, CRM notes, medical summaries, legal drafts, or chat logs into a prompt, the generated HTML may reproduce names, email addresses, account references, and other identifiers even when the request sounded generic. “Write a customer summary” can quietly become “publish a customer summary with customer data still attached.” Nobody wants to explain that to procurement.

Redaction depends on audience and purpose

Not every piece of data needs the same treatment in every workflow. Internal sharing inside a limited group with a legitimate business need is different from sending a public link to a partner, agency, or prospect. The more open the audience, the stronger the redaction standard should be.

A useful test is whether the recipient needs the identifying detail to act. If the goal is design review, use placeholders. If the goal is debugging, pseudonymize records and separate identity from behavior. If the goal is performance reporting, aggregate the results. Teams often preserve real identifiers out of habit, not necessity.

This is also where policy should override personal judgment. One employee may think a partial phone number is harmless. Another may think initials are enough. Governance exists because “seems fine to me” is not a control. Sanctioned workflows should define what gets masked, what gets removed, who can approve exceptions, and how shared artifacts are audited.

Redact, mask, or anonymize?

These are not interchangeable. Redaction removes the sensitive value entirely. Masking keeps part of the value visible, such as showing only the last four digits. Anonymization aims to make re-identification impractical, which is harder than many teams assume.

If a document needs reference utility, masking may be reasonable. If the recipient does not need identity at all, full redaction is better. If the data will be reused for analytics or training, anonymization might help, but only if you also remove the surrounding clues that make reverse engineering possible. Otherwise it is just a nicer-looking exposure.

Common redaction mistakes teams make

The biggest mistake is focusing only on regulated categories and ignoring operational data. Security incidents do not care whether the leaked value came from a privacy textbook. If it identifies a person or exposes access, it matters.

The second mistake is redacting the body but not the attachments, source code, or URL parameters. Teams clean the obvious text while leaving customer data inside downloadable files, hidden fields, or preview snippets. That is the digital equivalent of locking the front door and leaving the side gate open.

The third mistake is relying on manual review alone. Manual review is useful, but it is inconsistent under deadline pressure. Todd at 5:42 p.m. on a Friday is not the same reviewer as Todd at 9:15 a.m. after coffee. Automated detection for PII, credentials, and tokens closes part of that gap, especially in HTML where sensitive content can live in places a human may not inspect every time.

A workable standard for teams

If your team regularly shares HTML-based content, set a standard that starts broad and only narrows with approval. Treat names, emails, phone numbers, exact addresses, government IDs, financial details, health information, account identifiers, and authentication-related data as default redaction candidates. Then add context-based fields such as job title, location, dates, ticket numbers, and internal IDs whenever they could identify someone when combined.

Build review into the workflow, not after the link is already in someone’s inbox. Use tooling that scans for PII and secrets before content is shared, prevents indexing, supports expiration and access controls, and gives teams visibility into what was viewed and when. That is why sanctioned sharing tools exist. They reduce the number of judgment calls employees have to improvise under pressure.

HTMLvault is built for exactly this problem: sharing HTML without making security and compliance somebody else’s cleanup project.

If you are still debating what pii should be redacted, the safest answer is usually more than what is visible and earlier than you think. The cleanest share is the one that never turns into a forensic exercise.

pii-redactiondata-protectionsensitive-datahtml-sharingcompliance-riskindirect-identifiers