xlrd: in2csv fails to load codepage 21010 xls files
Created by: acook
This is a common issue in CSV parsers/Excel exporters apparently:
- https://github.com/tafia/calamine/pull/86 "Treat codepage 21010 as codepage 1200"
- https://github.com/SheetJS/js-xls/issues/44 "xls: error parsing bad.xls: Error: Unrecognized CP: 21010"
- https://github.com/PHPOffice/PHPExcel/issues/396 "Unknown codepage: 21010"
It seems to be generated from MacOS versions of Excel.
Basically, when encountering codepage 21010 it should interpret it as codepage 1200 (AKA UTF-16le).
Ideally this would be handled programmatically. However, even passing in --encoding utf-16le
(or other variations) seem to have no effect on in2csv
, so it might be ignoring the encoding argument?