Convert Unicode string to UTF-8, and then to JSON

Total Post:19


 5199  View(s)
Rate this:
Hi Developers,

I want to encode a string in UTF-8 and view the corresponding UTF-8 bytes individually. In the Python REPL the following seems to work fine:

>>> unicode('©', 'utf-8').encode('utf-8')'\xc2\xa9'

Note that I’m using U+00A9 COPYRIGHT SIGN as an example here. The '\xC2\xA9' looks close to what I want — a string consisting of two separate code points: U+00C2 and

U+00A9. (When UTF-8-decoded, it gives back the original string, '\xA9'.)

Then, I want the UTF-8-encoded string to be converted to a JSON-compatible string. However, the following doesn’t seem to do what I want:

>>> import json; json.dumps('\xc2\xa9')'"\\u00a9"'

Note that it generates a string containing U+00A9 (the original symbol). Instead, I need the UTF-8-encoded string, which would look like "\u00C2\u00A9" in valid JSON.

TL;DR How can I turn '©' into "\u00C2\u00A9" in Python? I feel like I’m missing something obvious — is there no built-in way to do this?
  1. Re: Convert Unicode string to UTF-8, and then to JSON

    Hi Babe,

    If you really want "\u00c2\u00a9" as the output, give json an Unicode string as input.

    >>> print json.dumps(u'\xc2\xa9')"\u00c2\u00a9"

    You can generate this Unicode string from the raw bytes:

    s = unicode('©', 'utf-8').encode('utf-8')
    s2 = u''.join(unichr(ord(c)) for c in s)

    I think what you really want is "\xc2\xa9" as the output, but I'm not sure how to generate that yet.


Please check, If you want to make this post sponsored

You are not a Sponsored Member. Click Here to Subscribe the Membership.