Support date format hints
Created by: samcrawford
Excel will output CSVs with dates in the user's locale, so a British user will see dates in the DD/MM/YYYY format, whilst an American user would see MM/DD/YYYY.
Currently it appears that csvkit (although I've only really been using csvsql) will always try MM/DD/YYYY first, and if that fails to parse, then it will fall back to DD/MM/YYYY.
The problem with this approach arises when you're using a DD/MM/YYYY formatted sheet and you have ambiguity in some dates. For example:
02/01/2014 31/12/2012
This will produce dates in the database of:
2014-01-02 (incorrectly parsed as MM/DD/YYYY) 2012-12-31
So you silently end up with a mixture of correct and incorrect dates, which is not ideal!
Ideally there'd be an option one could pass to csvkit programs to specify a preference list for parsing dates. It'd be overkill to specify every format (I think), so having some abbreviations would suffice. For example:
--date-formats "dmy,mdy"
The default date parsing schema would then follow after this if the formats in the list did not match successfully.
An extension to this may be to try to infer the date format by examining all rows globally and selecting the format that successfully parses them all (of course, this is still no guarantee of success as people may only have ambiguous dates present).