How to clean up CSV files for Journey AI
When uploading survey or feedback data to the Data Hub, the quality of your CSV files directly impacts the quality of insights Journey AI can generate. This guide will help you prepare your CSV files to maximize the extraction of valuable customer verbatims and improve your insight reliability scores.
Crucial: we limit CSV files to 10,000 rows.
Why CSV file preparation matters
Journey AI works best with genuine customer verbatims - open-ended responses in customers' own words. When you upload unprocessed CSV files containing both quantitative data (ratings, multiple choice answers) and qualitative data (open text responses), Journey AI has to filter through irrelevant information, which can:
Reduce the reliability factor of your insights
Create noise in pattern recognition
Make it harder to identify meaningful customer pain points and moments of delight
Step-by-step guide to CSV cleanup
1. Identify open-ended response columns
Start by identifying which columns in your CSV contain valuable open-ended responses. These typically include:
Comments and explanation fields
"Please explain why..." follow-up questions
Customer suggestions or feedback boxes
Interview transcripts or support ticket descriptions
2. Remove quantitative and administrative data
Delete columns that don't contain verbatims, such as:
Numerical ratings (NPS, CSAT, star ratings)
Multiple-choice selections
Yes/No responses
Timestamps, IDs, and tracking information
Demographic data that isn't in the customer's own words
Internal notes or categorizations not written by customers
3. Clean column headers
Rename column headers to be descriptive but concise
Use clear labels that indicate the context of the verbatim (e.g., "Post-Purchase Feedback" rather than "Q12")
Remove special characters from column names
4. Consolidate related verbatims (Optional)
For surveys with multiple related open text fields, consider whether to:
Keep separate columns for different question contexts
Or combine related open-ended responses into a single column (adding context identifiers if needed)
5. Handle empty responses
Remove rows that don't contain any open text responses
For partially completed surveys, consider keeping only rows with at least one meaningful verbatim
6. Data formatting best practices
Ensure text encoding is UTF-8 compatible
Remove any HTML formatting or special characters
Check for and fix any broken line breaks within responses
Make sure quotation marks within responses are properly escaped
Example: Before and after CSV cleanup
Before cleanup:
Response ID,Timestamp,NPS Score,Reason for Score,Product Rating,Product Comments,Service Rating,Service Comments,Would Recommend,Additional Feedback 12345,2023-04-15 09:23:45,8,Good overall but some issues,4,The app is intuitive but sometimes crashes when I try to export,5,Customer service was very helpful with my export issue,Yes,I wish there were more templates available
After cleanup:
Reason for NPS Score,Product Feedback,Service Feedback,Additional Comments Good overall but some issues,The app is intuitive but sometimes crashes when I try to export,Customer service was very helpful with my export issue,I wish there were more templates available
How this improves your insight reliability score
TheyDo's insight reliability score measures how well Journey AI can identify patterns across multiple sources. By cleaning your CSV files:
Increased Pattern Recognition: Journey AI can better identify genuine patterns in customer feedback when it's not distracted by quantitative data points.
Higher Quality Verbatims: The system can extract more meaningful quotes that represent true customer sentiment.
Cross-Source Validation: When patterns appear across multiple cleaned sources, Journey AI assigns higher reliability scores to those insights.
More Accurate Journey Creation: Clean verbatims lead to more accurate journey creation when using the "Create journey" function from Data Hub.
Pro tips for optimal results
Create separate files by feedback type: Split different types of feedback (product, service, onboarding) into separate files for better pattern identification.
Include context when possible: If removing a quantitative column would eliminate important context, consider adding brief context to the verbatim (e.g., "Rating: 2/5 - The checkout process was confusing because...")
Consistent processing: Develop a consistent cleanup process for all your CSV files to ensure uniform data quality.
Batch similar sources: When uploading to Data Hub, group similar sources together and assign the same source type (Interview, Survey, Support, or Feedback).
By following these guidelines, you'll significantly improve the quality of insights Journey AI can generate from your CSV files, leading to more actionable customer journey insights in TheyDo.
Need help?
If you have large or complex CSV files that need processing, contact our support team or your customer success manager for assistance. We're here to help you get the most value from your customer feedback data in TheyDo.