Meet the TheyDo Agent

How to clean up CSV files for AI

Overview

When uploading survey or feedback data to the Data Hub, the quality of your CSV file affects what Journey AI can extract from it. Clean files — ones that focus on open-ended customer responses — produce more reliable insights and better pattern recognition.

With column classification, you can also upload files as-is and tell Journey AI which columns to process during the upload step. Pre-cleaning is still useful if you want faster processing or more control over what gets analyzed.

Before you start

CSV files are limited to 10,000 rows
You'll need access to the Data Hub to upload files

Option 1: Use column classification during upload

If you'd rather not clean your file manually, Journey AI's column classification feature lets you decide which columns to process at the point of upload.

Upload your CSV file to the Data Hub as-is.
On the Column classification screen, review each column and apply a label:
- Text — open-ended responses you want Journey AI to analyze
- Ignore — numerical ratings, multiple choice, or any column you want to skip
- Date — timestamp columns
- Persona — customer segment or demographic columns
- Tag groups for any other categorical data
Save your selections and continue with the upload.

Journey AI will only process the columns you've marked as Text.

Option 2: Clean your CSV file manually

Manual cleanup gives you more control and can improve processing speed. Follow the steps below before uploading.

1. Identify your open-ended response columns

Look for columns that contain genuine customer language, such as:

"Please explain why..." follow-up questions
Comment or feedback fields
Customer suggestions
Interview transcripts or support ticket descriptions

2. Remove columns that don't contain verbatims

Delete anything that isn't an open-ended response, including:

Numerical ratings (NPS, CSAT, star ratings)
Multiple-choice or Yes/No responses
Timestamps, IDs, and tracking data
Demographic fields not written in the customer's own words
Internal notes or categorizations

3. Clean up column headers

Use descriptive, concise labels (for example, "Post-Purchase Feedback" instead of "Q12")
Remove any special characters from column names

4. Handle empty rows

Remove rows with no open-text responses
For partially completed surveys, keep rows that have at least one meaningful verbatim

5. Check formatting

Make sure the file is UTF-8 encoded
Remove any HTML tags or special characters from response text
Fix any broken line breaks within responses
Check that quotation marks within responses are properly escaped

Example: Before and after

Before cleanup

Response ID	Timestamp	NPS Score	Reason for Score	Product Rating	Product Comments	Service Rating	Service Comments
12345	2023-04-15	8	Good overall but some issues	4	The app is intuitive but sometimes crashes on export	5	Customer service was very helpful

After cleanup

Reason for NPS Score	Product Feedback	Service Feedback
Good overall but some issues	The app is intuitive but sometimes crashes on export	Customer service was very helpful

Tips

Split by feedback type. Separate files for product, service, and onboarding feedback makes pattern recognition more accurate.
Add context when removing columns. If a quantitative column provides important context, consider weaving it into the verbatim — for example: "Rating: 2/5 — The checkout process was confusing because..."
Be consistent. Use the same cleanup process across all your CSV files so data quality stays uniform.
Group similar sources. When uploading to the Data Hub, assign the same source type (Interview, Survey, Support, or Feedback) to similar files.