Speaker Detection and PII Protection in Interview Processing

When processing interview transcripts in TheyDo, Journey AI automatically detects speakers and protects personal identifiable information (PII) while maintaining the valuable context of who said what. This guide explains how to use these features effectively.

What is Speaker Detection?

Speaker detection is an AI feature that automatically identifies different speakers in interview transcripts when you upload them to the Data Hub. This allows you to:

  • Assign personas to specific speakers

  • Choose which speakers to include in quote extraction

  • Maintain speaker context while protecting privacy

  • Focus insights on customer voices rather than researcher questions

How Speaker Detection Works

Step 1: Upload and Detection

When you upload an interview file (.txt format) or paste interview text:

  1. Journey AI automatically detects the file type as "Interview"

  2. The system identifies all unique speakers in the conversation

  3. Speakers are presented for your review before processing

Step 2: Speaker Configuration

After detection, you'll see a speaker configuration screen showing:

  • All detected speakers (e.g., "Tony", "Samantha")

  • Include/Exclude toggles for each speaker

  • Persona assignment dropdown for each speaker

Step 3: Configure Your Speakers

Excluding Speakers

Toggle off speakers you want to exclude from quote extraction. Common use cases:

  • Exclude researchers/interviewers: Focus only on customer responses

  • Exclude observers: Remove non-participant comments

  • Exclude irrelevant speakers: Filter out administrative voices

Example: If "Tony" is the researcher and "Samantha" is the customer, exclude Tony to extract quotes only from Samantha.

Assigning Personas

For included speakers, you can:

  • Assign existing personas: Select from your workspace personas via dropdown

  • Leave unassigned: Process without persona attribution

  • Map multiple speakers to the same persona if they represent the same customer segment

Benefits of persona assignment:

  • All quotes from that speaker inherit the persona

  • Filter insights by persona in journey views

  • Track patterns across customer segments

  • Maintain context even after PII removal

PII Protection Process

How PII Obfuscation Works

TheyDo implements a two-stage approach to protect personal information:

During Upload (Configuration Stage):

  • You can see actual speaker names (e.g., "Samantha")

  • This visibility helps you correctly assign personas

  • Only the person uploading can see this information

After Processing (In Quotes and Insights):

  • Real names are replaced with generic identifiers (e.g., "Person 2")

  • Email addresses, phone numbers, and other PII are obfuscated

  • Persona assignments remain intact

  • Original speaker context is preserved without exposing identity

What Gets Obfuscated

  • Personal names

  • Email addresses

  • Phone numbers

  • Physical addresses

  • Social security numbers

  • Credit card information

  • Other identifiable personal data

What Gets Preserved

  • Assigned personas

  • Quote attribution to speakers (using anonymous identifiers)

  • The relationship between quotes from the same speaker

  • Sentiment and experience impact

  • All non-PII content

Best Practices

For Interview Preparation

  1. Consistent speaker labels: Use clear speaker identifiers in your transcripts

  2. Format consistency: Maintain consistent formatting (e.g., "Speaker Name: dialogue")

  3. Clean transcripts: Remove timestamps or metadata that might interfere with detection

For Speaker Configuration

  1. Always exclude researchers: Focus on customer voices unless researcher insights are specifically needed

  2. Use persona assignment: Connect speakers to existing personas for better filtering

  3. Review before processing: Double-check speaker inclusion/exclusion before continuing

For Privacy Compliance

  1. Process immediately: The PII obfuscation happens automatically after upload

  2. No manual PII removal needed: The system handles this for you

  3. Audit trail maintained: System tracks who uploaded files while protecting subject identity

Working with Processed Interviews

After processing, your interview quotes will:

  • Show anonymous speaker identifiers (Person 1, Person 2, etc.)

  • Maintain all assigned personas

  • Be ready for insight mining with full context

  • Protect participant privacy while preserving research value

Filtering and Analysis

You can still:

  • Filter quotes by assigned persona

  • See which quotes came from the same speaker

  • Track sentiment patterns by persona

  • Build journeys based on specific customer segments

Tips for Success

  1. Create personas first: Set up your personas before uploading interviews for smoother assignment

  2. Batch similar interviews: Process interviews from the same customer segment together

  3. Document your mapping: Keep a record of which personas you've assigned to which types of speakers

  4. Use descriptive file names: Help yourself remember which interviews contain which customer types

Integration with Journey AI Features

Speaker detection and PII protection work seamlessly with other Journey AI features:

  • Insight Mining: Filtered quotes maintain speaker context

  • Journey Enrichment: Persona-tagged quotes enrich relevant journey steps

  • Experience Scoring: Sentiment analysis respects speaker filtering

  • Cross-source Analysis: Persona assignments enable pattern recognition across multiple interviews

By leveraging speaker detection and PII protection, you can maintain research integrity, protect participant privacy, and extract more targeted insights from your interview data.

Common Questions

Q: Can I change speaker assignments after processing? No, speaker configuration happens during upload. To change assignments, you'll need to re-upload the file.

Q: What if speaker detection misses a speaker? The system is designed to catch all unique speaker identifiers. Ensure your transcript uses consistent speaker labeling.

Q: Can I see the original names after PII obfuscation? No, once processed, the original names are permanently replaced to ensure privacy protection.

Q: How does this work with multiple interviews? Each interview file is processed independently. Assign the same persona across interviews to maintain consistency.

Continue reading: