Python 3, Pandas and Encoding Issues

It is not unusual to come across encoding problems when opening files in Python 3. The subject matter is a large topic of discussion, and here I am providing some quick ways to deal with a typical encoding issue you are likely to encounter.

Say you are interested in opening a CSV file to be loaded into a pandas dataframe. If the stars align and the generator of your CSV is magnanimous, they may have saved the file using UTF-8. If so you may get away with reading the file (here called  myfile.csv) as follows

You should in principle pass a parameter to pandas telling it what encoding the file has been saved with, so a more complete version of the snippet above would be:

Encoding conundrum

What happens when you don’t know what encoding was used to save the file? Well, you can ask, but it is very unlikely that the file generator know… What to do? Well there are some libraries that can be helpful.

Install the  chardet  module as follows from the terminal

And use the following snippet as a guide:

Et voilà!