What is PII stripping in Journey AI?
Overview
When you upload interview transcripts or other source files to the Data Hub, Journey AI automatically detects and removes personally identifiable information (PII) before any further processing takes place. This means quotes, insights, and speaker attribution all remain useful — without exposing the personal details of research participants.
PII stripping happens automatically during ingestion, whether you upload a file manually or through an integration. You don't need to clean transcripts before uploading.
What gets stripped
Journey AI uses a combination of large language models (LLMs) and Named Entity Recognition (NER) tools to detect PII. The following types of information are obfuscated during ingestion:
- Personal names
- Email addresses
- Phone numbers
- Physical addresses
- Social Security numbers
- Credit card numbers
- IP addresses
- ID numbers
- Dates of birth
- Flight numbers
- Social media handles
Once stripped, real names and identifiers are replaced with generic labels (for example, "Person 1", "Person 2"). This happens automatically — you don't need to do anything manually.
What gets preserved
Stripping PII doesn't affect the research value of your sources. After processing:
- Speaker context is maintained using anonymous identifiers
- Assigned personas carry over to all quotes from that speaker
- Sentiment, experience impact, and quote content remain intact
- All non-PII text is preserved exactly as uploaded
How it works with speaker detection
When you upload an interview transcript, Journey AI also detects the individual speakers in the conversation. During the upload configuration step, you can see actual speaker names — this is intentional, so you can correctly assign personas before processing begins.
Once you confirm and the file is processed, those names are replaced with anonymous identifiers throughout the system. The persona assignments you made are retained, so you can still filter and analyze quotes by customer segment.
Note: Speaker configuration happens during upload. If you need to change persona assignments, you'll need to re-upload the file.
Before you start
- PII stripping requires AI to be enabled for your organization. Go to Settings > Details & AI and make sure both Activate AI for this organization and Remove personally identifiable information are turned on. This is an admin action.
- The feature applies to all sources ingested through the Data Hub, including manual uploads and integrations.
- Once processed, original names and PII cannot be recovered from within TheyDo.
Tips
- Set up personas before uploading so you can assign speakers during the configuration step and maintain useful context after PII is removed.
- Use consistent speaker labels in your transcripts (for example,
Interviewer:andParticipant:) to help Journey AI detect speakers accurately. - Keep a separate record of any speaker-to-persona mappings outside TheyDo if you need to reference them later, since original names won't be visible after processing.