File Encoding with the Command Line – Determining and Converting

With the changes that Python 3 has brought to bear in terms of dealing with character encodings, I have written before some tips that I use on my day to day work. It is sometimes useful to determine the character encoding of a files at a much earlier stage. The command line is a perfect tool to help us with these issues. 

The basic syntax you need is the following one:

$ file -I filename

Furthermore, you can even use the command line to convert the encoding of a file into another one. The syntax is as follows:

$ iconv -f encoding_source -t encoding_target filename

For instance if you needed to convert an ISO88592 file called input.txt into UTF8 you can use the following line:

$ iconv -f iso-8859-1 -t utf-8 < input.txt > output.txt

If you want to check a list of know coded characters that you can handle with this command simply type:

$ iconv --list

Et voilà!

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.