csvsql --or {ignore,replace}
Created by: shawnbot
I'm working on tools to import messy and sometimes redundant CSVs into a sqlite database, and csvsql mostly does what I want (and, FWIW, is just totally awesome). However, what I'd like to be able to do is create a unique key constraint on the table I'm inserting and have my inserts statements include an OR IGNORE
clause to skip the duplicate rows. My initial thinking was that if csvsql could write SQL to stdout I could just modify the output and pipe it back through sqilte, but #147 (closed) seems to suggest that this isn't possible.
That's why I'm suggesting the introduction of a new flag to csvsql:
csvsql --or {ignore,replace}
I'm not sure off the top of my head if SQLAlchemy provides a place for the OR
clause, but if it does (or if the variations across the different APIs are minor), then I think it would be worth implementing, because every single potential use that I've found for csvsql has been stymied by unique key constraints.
For example, some open data is published daily in rolling intervals (e.g., one week), so importing data daily into a larger archive means knowing which swath of the data to ignore on import.
I've got some more specific examples if you're interested. Let me know if I can help!