Speaker Detection and PII Protection in Interview Processing
When processing interview transcripts in TheyDo, Journey AI automatically detects speakers and protects personal identifiable information (PII) while maintaining the valuable context of who said what. This guide explains how to use these features effectively.
What is Speaker Detection?
Speaker detection is an AI feature that automatically identifies different speakers in interview transcripts when you upload them to the Data Hub. This allows you to:
Assign personas to specific speakers
Choose which speakers to include in quote extraction
Maintain speaker context while protecting privacy
Focus insights on customer voices rather than researcher questions
How Speaker Detection Works
Step 1: Upload and Detection
When you upload an interview file (.txt format) or paste interview text:
Journey AI automatically detects the file type as "Interview"
The system identifies all unique speakers in the conversation
Speakers are presented for your review before processing
Step 2: Speaker Configuration
After detection, you'll see a speaker configuration screen showing:
All detected speakers (e.g., "Tony", "Samantha")
Include/Exclude toggles for each speaker
Persona assignment dropdown for each speaker
Step 3: Configure Your Speakers
Excluding Speakers
Toggle off speakers you want to exclude from quote extraction. Common use cases:
Exclude researchers/interviewers: Focus only on customer responses
Exclude observers: Remove non-participant comments
Exclude irrelevant speakers: Filter out administrative voices
Example: If "Tony" is the researcher and "Samantha" is the customer, exclude Tony to extract quotes only from Samantha.
Assigning Personas
For included speakers, you can:
Assign existing personas: Select from your workspace personas via dropdown
Leave unassigned: Process without persona attribution
Map multiple speakers to the same persona if they represent the same customer segment
Benefits of persona assignment:
All quotes from that speaker inherit the persona
Filter insights by persona in journey views
Track patterns across customer segments
Maintain context even after PII removal
PII Protection Process
How PII Obfuscation Works
TheyDo implements a two-stage approach to protect personal information:
During Upload (Configuration Stage):
You can see actual speaker names (e.g., "Samantha")
This visibility helps you correctly assign personas
Only the person uploading can see this information
After Processing (In Quotes and Insights):
Real names are replaced with generic identifiers (e.g., "Person 2")
Email addresses, phone numbers, and other PII are obfuscated
Persona assignments remain intact
Original speaker context is preserved without exposing identity
What Gets Obfuscated
Personal names
Email addresses
Phone numbers
Physical addresses
Social security numbers
Credit card information
Other identifiable personal data
What Gets Preserved
Assigned personas
Quote attribution to speakers (using anonymous identifiers)
The relationship between quotes from the same speaker
Sentiment and experience impact
All non-PII content
Best Practices
For Interview Preparation
Consistent speaker labels: Use clear speaker identifiers in your transcripts
Format consistency: Maintain consistent formatting (e.g., "Speaker Name: dialogue")
Clean transcripts: Remove timestamps or metadata that might interfere with detection
For Speaker Configuration
Always exclude researchers: Focus on customer voices unless researcher insights are specifically needed
Use persona assignment: Connect speakers to existing personas for better filtering
Review before processing: Double-check speaker inclusion/exclusion before continuing
For Privacy Compliance
Process immediately: The PII obfuscation happens automatically after upload
No manual PII removal needed: The system handles this for you
Audit trail maintained: System tracks who uploaded files while protecting subject identity
Working with Processed Interviews
After processing, your interview quotes will:
Show anonymous speaker identifiers (Person 1, Person 2, etc.)
Maintain all assigned personas
Be ready for insight mining with full context
Protect participant privacy while preserving research value
Filtering and Analysis
You can still:
Filter quotes by assigned persona
See which quotes came from the same speaker
Track sentiment patterns by persona
Build journeys based on specific customer segments
Tips for Success
Create personas first: Set up your personas before uploading interviews for smoother assignment
Batch similar interviews: Process interviews from the same customer segment together
Document your mapping: Keep a record of which personas you've assigned to which types of speakers
Use descriptive file names: Help yourself remember which interviews contain which customer types
Integration with Journey AI Features
Speaker detection and PII protection work seamlessly with other Journey AI features:
Insight Mining: Filtered quotes maintain speaker context
Journey Enrichment: Persona-tagged quotes enrich relevant journey steps
Experience Scoring: Sentiment analysis respects speaker filtering
Cross-source Analysis: Persona assignments enable pattern recognition across multiple interviews
By leveraging speaker detection and PII protection, you can maintain research integrity, protect participant privacy, and extract more targeted insights from your interview data.
Common Questions
Q: Can I change speaker assignments after processing? No, speaker configuration happens during upload. To change assignments, you'll need to re-upload the file.
Q: What if speaker detection misses a speaker? The system is designed to catch all unique speaker identifiers. Ensure your transcript uses consistent speaker labeling.
Q: Can I see the original names after PII obfuscation? No, once processed, the original names are permanently replaced to ensure privacy protection.
Q: How does this work with multiple interviews? Each interview file is processed independently. Assign the same persona across interviews to maintain consistency.