forum

Home / DeveloperSection / Forums / Convert Unicode string to UTF-8, and then to JSON

Convert Unicode string to UTF-8, and then to JSON

Anonymous User596920-Jun-2013
Hi Developers,

I want to encode a string in UTF-8 and view the corresponding UTF-8 bytes individually. In the Python REPL the following seems to work fine:

>>> unicode('©', 'utf-8').encode('utf-8')'\xc2\xa9'

Note that I’m using U+00A9 COPYRIGHT SIGN as an example here. The '\xC2\xA9' looks close to what I want — a string consisting of two separate code points: U+00C2 and

U+00A9. (When UTF-8-decoded, it gives back the original string, '\xA9'.)

Then, I want the UTF-8-encoded string to be converted to a JSON-compatible string. However, the following doesn’t seem to do what I want:

>>> import json; json.dumps('\xc2\xa9')'"\\u00a9"'

Note that it generates a string containing U+00A9 (the original symbol). Instead, I need the UTF-8-encoded string, which would look like "\u00C2\u00A9" in valid JSON.

TL;DR How can I turn '©' into "\u00C2\u00A9" in Python? I feel like I’m missing something obvious — is there no built-in way to do this?

Updated on 20-Jun-2013
I am a content writter !

Can you answer this question?


Answer

1 Answers

Liked By