
Pandas allows to specify encoding, but does not allow to ignore errors not to automatically replace the offending bytes.

What's the best way to correct this to proceed with the import? The source/creation of these files all come from the same place. UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 6: invalid continuation byte File "C:\Importer\src\dfman\importer.py", line 26, in import_chrĭata = pd.read_csv(filepath, names=fields)įile "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 400, in parser_fįile "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _readįile "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 608, in readįile "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 1028, in readįile "parser.pyx", line 706, in (pandas\parser.c:6745)įile "parser.pyx", line 728, in ._read_low_memory (pandas\parser.c:6964)įile "parser.pyx", line 804, in ._read_rows (pandas\parser.c:7780)įile "parser.pyx", line 890, in ._convert_column_data (pandas\parser.c:8793)įile "parser.pyx", line 950, in ._convert_tokens (pandas\parser.c:9484)įile "parser.pyx", line 1026, in ._convert_with_dtype (pandas\parser.c:10642)įile "parser.pyx", line 1046, in ._string_convert (pandas\parser.c:10853)įile "parser.pyx", line 1278, in pandas.parser._string_box_utf8 (pandas\parser.c:15657) A random number of them are stopping and producing this error. Using pandas to do the heavy lifting, and assuming this valid csv file, this is one way of doing what you want: import jsonįor key, grp in df.I'm running a program which is processing 30,000 similar files.

Your data, converted to valid csv is saved in data.csv: PrimaryId,FirstName,LastName,City,CarName,DogName

The CSV is structured as follows: PrimaryId,FirstName,LastName,City,CarName,DogNameīoth this post and this one have helped but I'm yet to create the correct structure. The CSV is generated from SQL which creates multiple rows for each primary id. I’m trying to convert a flat structured CSV into a nested JSON structure.
