Drop in a messy export — dates in four formats, currencies that don't match, duplicate rows, garbled characters. You get back two things in your Drive: the cleaned spreadsheet, and the little Python script that did the cleanup. Next month's export? Run the script yourself, or hand it to engineering.
No silent black box. The Agent runs real pandas in an isolated sandbox and you can see exactly what it touched, what it inferred, and what it flagged for you to decide.
Two files land in your Drive — the cleaned .xlsx, and the actual Python script the Agent wrote to do the work. Run it again next month with new data — same rules, no Vecbase needed. Or paste it to your data engineer.
Saved to Drive
crm-export-cleaned-may.xlsx
2.1 MB
cleaned
crm-cleanup-2026-05.py
3.2 KB
Re-run next month
Schedule it for the 1st of every month, pointing at /imports/crm-latest.csv — same rules, zero touch.
Run locally · python cleanup.py crm-export-june.csv
How it works
Step 01
Drop the messy file
Drag any CSV, TSV, Excel, or JSON file in. The Agent figures out the encoding, the delimiter, the header row, and the column types on its own — even if your export tool got creative.
Step 02
Tell the Agent what counts as clean
Type your rules in chat — "USD only", "drop deals under $500", "use the standard SIC list for industries", "flag blank emails, don't delete them". Anything ambiguous, the Agent asks before deciding; nothing gets quietly thrown away.
Step 03
Get cleaned data + the script
Two files land in your Drive: the cleaned spreadsheet, and a small Python script (`cleanup.py`) you can run again anytime. Set it on a weekly schedule, hand it to a data engineer, or just open it next month when new data arrives.
Why Vecbase for this
Catches the mess your eyes skim past
Dates in mixed formats. Currency symbols hiding in number columns. Garbled characters from the wrong encoding. Three spellings of the same industry. The Agent finds all of it before touching a single row.
You keep the script, not just the file
Every cleanup leaves a real Python script in your Drive. Next month's export — run it yourself. Want to tweak a rule — open it in any editor. After the first run you don't even need to come back to Vecbase.
Asks before dropping anything ambiguous
Near-duplicates, low-confidence guesses, suspicious outliers — the Agent puts them in a "needs review" sheet with context instead of quietly deleting. You call the edge cases.
Handles million-row files without choking your browser
Open a 3-million-row export in your browser and it freezes. Upload it here and the work happens off your laptop, with real memory and real processing power. Drop the file, walk away, come back to a finished result.
Frequently asked
200 MB through the upload UI. Larger? Drop it in your workspace bucket first and just point the Agent at the path — it'll stream it through the sandbox. Multi-million-row CSVs are routine.
Get yours in under 90 seconds
Sign in, hand it over to the Agent — the finished file lands in your Drive.