.Net WebClient DownloadString screwed up my Unicode non english characters

For the past several days, I was trying to build a small utility tool to copy a Tumblr blog in the same account. Some of my posts contains unicode characters, and instead of getting مركز ميركاتو instead I’m getting مركز ميركاتو. Probably not much difference to you and I, but for arabic readers, the two made a lot of difference Smile

Originally I had something like the following:

string data = client.DownloadString("[some url]");
var reader = new StringReader(data);

Pretty straight forward right? But the thing is, it doesn’t work. So I found out the hard way, client.DownloadString doesn’t encode the characters using UTF-8.

To do that, I had to change the code to the following:

var data = client.DownloadData("[some url]");
var strungData = Encoding.UTF8.GetString(data);
var reader = new StringReader(strungData);


About this entry