Say, in Python2, if we need to format a datetime
object with some unicode in the format, what shall we do?
The following code looks perfect
# encoding=utf-8
import jinja2
import datetime
now = datetime.datetime.now()
print(jinja2.Template(u'''{{ date.strftime('%Y 年 %m 月') }}''').render(date=now))
Except that it raises a UnicodeEncodeError
.
Traceback (most recent call last):
File "test.py", line 8, in <module>
print(jinja2.Template(u'''{{ date.strftime('%Y 年 %m 月') }}''').render(date=now))
File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 1008, in render
return self.environment.handle_exception(exc_info, True)
File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 780, in handle_exception
reraise(exc_type, exc_value, tb)
File "<template>", line 1, in top-level template code
UnicodeEncodeError: 'ascii' codec can't encode character u'\u5e74' in position 3: ordinal not in range(128)
So what's wrong with that? The reason is that many standard libraries in Python2 don't have good support to unicode. What a cruel fact.
Well, since we have declared that the file is encoded with UTF-8
, how about directly using a str
instead of a unicode?
If we use only the standard library it actually works.
# encoding=utf-8
import datetime
now = datetime.datetime.now()
print(now.strftime('%Y 年 %m 月'))
This would produce a desired output. So you may think let's remove the prefix u
and the template rendering becomes fine, right?
Unfortunately, to do so will get you a UnicodeDecodeError
like this
Traceback (most recent call last):
File "test.py", line 8, in <module>
print(jinja2.Template('''{{ date.strftime('%Y 年 %m 月') }}''').render(date=now))
File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 945, in __new__
return env.from_string(source, template_class=cls)
File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 880, in from_string
return cls.from_code(self, self.compile(source), globals, None)
File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 579, in compile
source = self._parse(source, name, filename)
File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 497, in _parse
return Parser(self, source, name, encode_filename(filename)).parse()
File "/usr/local/lib/python2.7/dist-packages/jinja2/parser.py", line 40, in __init__
self.stream = environment._tokenize(source, name, filename, state)
File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 528, in _tokenize
source = self.preprocess(source, name, filename)
File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 522, in preprocess
self.iter_extensions(), text_type(source))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 21: ordinal not in range(128)
Well, this is because Jinja2 doesn't take a str
with unicode as input. It's a sad paradox that using str
will make Jinja2 unhappy while using unicode will make strftime
unhappy.
So the only solution is to adapt strftime
to take unicode as input. And to do this we need a customized strftime
implement, like this
# encoding=utf-8
import jinja2
import datetime
def strftime(dt, fmt):
return dt.strftime(fmt.encode('utf-8')).decode('utf-8')
now = datetime.datetime.now()
print(jinja2.Template(u'''{{ strftime(date, '%Y 年 %m 月') }}''')
.render(date=now, strftime=strftime))
Or to be more "jinjaic", declare a filter for it.
# encoding=utf-8
import jinja2
import datetime
def strftime(dt, fmt):
return dt.strftime(fmt.encode('utf-8')).decode('utf-8')
env = jinja2.Environment(loader=jinja2.DictLoader(
{'test': u'''{{ date|strftime('%Y 年 %m 月') }}'''}))
env.filters['strftime'] = strftime
t = env.get_template('test')
print(t.render(date=datetime.datetime.now()))
NOTE: This will NOT reproduce in Python3 since all the strings are unicodes then.