Personally Identifiable Information stripping in Journey AI

Personally Identifiable Information (PII) stripping in Journey AI is performed during source ingestion (no matter whether the source was uploaded manually or via an integration), where Journey AI identifies and removes PII from sources before any further processing. The process is model-driven and leverages both LLMs and dedicated Named Entity Recognition (NER) tools.

How it works:

  • When a source is ingested, Journey AI runs a PII detection pipeline using a specialized prompt template for Named Entity Recognition (NER). The prompt instructs the model to extract PII entities (e.g. PERSON, ADDRESS, EMAIL, SSN, PHONE, CREDIT_CARD, IP_ADDRESS, ID_NUMBER, FLIGHT_NUMBER, SOCIAL_MEDIA_HANDLE, DATE_OF_BIRTH) and group all surface forms of the same entity together in a JSON schema.

  • The stripped (obfuscated) version of the source is stored and used for all downstream operations, including quote extraction and display in the UI. The raw (unredacted) text can optionally be retained for traceability, but is not exposed to users unless explicitly permitted.

Journey AI uses state-of-the-art LLMs and open-source tools to automatically detect and remove PII from sources during ingestion.

Continue reading: