How to clean up CSV files for AI

Overview

When uploading survey or feedback data to the Data Hub, the quality of your CSV file affects what Journey AI can extract from it. Clean files — ones that focus on open-ended customer responses — produce more reliable insights and better pattern recognition.

With column classification, you can also upload files as-is and tell Journey AI which columns to process during the upload step. Pre-cleaning is still useful if you want faster processing or more control over what gets analyzed.

Before you start

  • CSV files are limited to 10,000 rows
  • You'll need access to the Data Hub to upload files

Option 1: Use column classification during upload

If you'd rather not clean your file manually, Journey AI's column classification feature lets you decide which columns to process at the point of upload.

  1. Upload your CSV file to the Data Hub as-is.
  2. On the Column classification screen, review each column and apply a label:
    • Text — open-ended responses you want Journey AI to analyze
    • Ignore — numerical ratings, multiple choice, or any column you want to skip
    • Date — timestamp columns
    • Persona — customer segment or demographic columns
    • Tag groups for any other categorical data
  3. Save your selections and continue with the upload.

Journey AI will only process the columns you've marked as Text.

Option 2: Clean your CSV file manually

Manual cleanup gives you more control and can improve processing speed. Follow the steps below before uploading.

1. Identify your open-ended response columns

Look for columns that contain genuine customer language, such as:

  • "Please explain why..." follow-up questions
  • Comment or feedback fields
  • Customer suggestions
  • Interview transcripts or support ticket descriptions

2. Remove columns that don't contain verbatims

Delete anything that isn't an open-ended response, including:

  • Numerical ratings (NPS, CSAT, star ratings)
  • Multiple-choice or Yes/No responses
  • Timestamps, IDs, and tracking data
  • Demographic fields not written in the customer's own words
  • Internal notes or categorizations

3. Clean up column headers

  • Use descriptive, concise labels (for example, "Post-Purchase Feedback" instead of "Q12")
  • Remove any special characters from column names

4. Handle empty rows

  • Remove rows with no open-text responses
  • For partially completed surveys, keep rows that have at least one meaningful verbatim

5. Check formatting

  • Make sure the file is UTF-8 encoded
  • Remove any HTML tags or special characters from response text
  • Fix any broken line breaks within responses
  • Check that quotation marks within responses are properly escaped

Example: Before and after

Before cleanup

Response ID Timestamp NPS Score Reason for Score Product Rating Product Comments Service Rating Service Comments
12345 2023-04-15 8 Good overall but some issues 4 The app is intuitive but sometimes crashes on export 5 Customer service was very helpful

After cleanup

Reason for NPS Score Product Feedback Service Feedback
Good overall but some issues The app is intuitive but sometimes crashes on export Customer service was very helpful

Tips

  • Split by feedback type. Separate files for product, service, and onboarding feedback makes pattern recognition more accurate.
  • Add context when removing columns. If a quantitative column provides important context, consider weaving it into the verbatim — for example: "Rating: 2/5 — The checkout process was confusing because..."
  • Be consistent. Use the same cleanup process across all your CSV files so data quality stays uniform.
  • Group similar sources. When uploading to the Data Hub, assign the same source type (Interview, Survey, Support, or Feedback) to similar files.