Bit Focus Blog http://iblog.bitfoc.us Unicode troubleshooting : strftime in Jinja2 http://iblog.bitfoc.us/p/5 http://iblog.bitfoc.us/p/5 Apr 24 2018 - 14:16:28 +0000

Say, in Python2, if we need to format a datetime object with some unicode in the format, what shall we do?

The following code looks perfect

Code Snippet 0-0

# encoding=utf-8

import jinja2
import datetime

now = datetime.datetime.now()

print(jinja2.Template(u'''{{ date.strftime('%Y 年 %m 月') }}''').render(date=now))

Except that it raises a UnicodeEncodeError.

Code Snippet 0-1

Traceback (most recent call last):
  File "test.py", line 8, in <module>
    print(jinja2.Template(u'''{{ date.strftime('%Y 年 %m 月') }}''').render(date=now))
  File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 1008, in render
    return self.environment.handle_exception(exc_info, True)
  File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "<template>", line 1, in top-level template code
UnicodeEncodeError: 'ascii' codec can't encode character u'\u5e74' in position 3: ordinal not in range(128)

So what's wrong with that? The reason is that many standard libraries in Python2 don't have good support to unicode. What a cruel fact.

Well, since we have declared that the file is encoded with UTF-8, how about directly using a str instead of a unicode?

If we use only the standard library it actually works.

Code Snippet 0-2

# encoding=utf-8

import datetime

now = datetime.datetime.now()

print(now.strftime('%Y 年 %m 月'))

This would produce a desired output. So you may think let's remove the prefix u and the template rendering becomes fine, right?

Unfortunately, to do so will get you a UnicodeDecodeError like this

Code Snippet 0-3

Traceback (most recent call last):
  File "test.py", line 8, in <module>
    print(jinja2.Template('''{{ date.strftime('%Y 年 %m 月') }}''').render(date=now))
  File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 945, in __new__
    return env.from_string(source, template_class=cls)
  File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 880, in from_string
    return cls.from_code(self, self.compile(source), globals, None)
  File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 579, in compile
    source = self._parse(source, name, filename)
  File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 497, in _parse
    return Parser(self, source, name, encode_filename(filename)).parse()
  File "/usr/local/lib/python2.7/dist-packages/jinja2/parser.py", line 40, in __init__
    self.stream = environment._tokenize(source, name, filename, state)
  File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 528, in _tokenize
    source = self.preprocess(source, name, filename)
  File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 522, in preprocess
    self.iter_extensions(), text_type(source))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 21: ordinal not in range(128)

Well, this is because Jinja2 doesn't take a str with unicode as input. It's a sad paradox that using str will make Jinja2 unhappy while using unicode will make strftime unhappy.

So the only solution is to adapt strftime to take unicode as input. And to do this we need a customized strftime implement, like this

Code Snippet 0-4

# encoding=utf-8

import jinja2
import datetime

def strftime(dt, fmt):
    return dt.strftime(fmt.encode('utf-8')).decode('utf-8')

now = datetime.datetime.now()

print(jinja2.Template(u'''{{ strftime(date, '%Y 年 %m 月') }}''')
                        .render(date=now, strftime=strftime))

Or to be more "jinjaic", declare a filter for it.

Code Snippet 0-5

# encoding=utf-8

import jinja2
import datetime

def strftime(dt, fmt):
    return dt.strftime(fmt.encode('utf-8')).decode('utf-8')

env = jinja2.Environment(loader=jinja2.DictLoader(
            {'test': u'''{{ date|strftime('%Y 年 %m 月') }}'''}))
env.filters['strftime'] = strftime
t = env.get_template('test')
print(t.render(date=datetime.datetime.now()))

NOTE: This will NOT reproduce in Python3 since all the strings are unicodes then.

]]>