company_banner

Tips and tricks from my Telegram-channel @pythonetc, February 2019

    image

    It is new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

    Previous publications.

    Structures comparing


    Sometimes you want to compare complex structures in tests ignoring some values. Usually, it can be done by comparing particular values with the structure:

    >>> d = dict(a=1, b=2, c=3)
    >>> assert d['a'] == 1
    >>> assert d['c'] == 3

    However, you can create special value that reports being equal to any other value:

    >>> assert d == dict(a=1, b=ANY, c=3)

    That can be easily done by defining the __eq__ method:

    >>> class AnyClass:
    ...     def __eq__(self, another):
    ...         return True
    ...
    >>> ANY = AnyClass()

    sys.stdout is a wrapper that allows you to write strings instead of raw bytes. The string is encoded automatically using sys.stdout.encoding:

    >>> _ = sys.stdout.write('Straße\n')
    Straße
    >>> sys.stdout.encoding
    'UTF-8'

    sys.stdout.encoding is read-only and is equal to Python default encoding, which can be changed by setting the PYTHONIOENCODING environment variable:

    $ PYTHONIOENCODING=cp1251 python3
    Python 3.6.6 (default, Aug 13 2018, 18:24:23)
    [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> sys.stdout.encoding
    'cp1251'

    If you want to write bytes to stdout you can bypass automatic encoding by accessing the wrapped buffer with sys.stdout.buffer:

    >>> sys.stdout
    <_io.TextIOWrapper name='<stdout>' mode='w' encoding='cp1251'>
    >>> sys.stdout.buffer
    <_io.BufferedWriter name='<stdout>'>
    >>> _ = sys.stdout.buffer.write(b'Stra\xc3\x9fe\n')
    Straße

    sys.stdout.buffer is also a wrapper that does buffering for you. It can be bypassed by accessing the raw file handler with sys.stdout.buffer.raw:

    >>> _ = sys.stdout.buffer.raw.write(b'Stra\xc3\x9fe')
    Straße

    Ellipsis constant


    Python has a very short list of built-in constants. One of them is Ellipsis which is also can be written as .... This constant has no special meaning for the interpreter but is used in places where such syntax looks appropriate.

    numpy support Ellipsis as a __getitem__ argument, e. g. x[...] returns all elements of x.

    PEP 484 defines additional meaning: Callable[..., type] is a way to define a type of callables with no argument types specified.

    Finally, you can use ... to indicate that function is not yet implemented. This is a completely valid Python code:

    def x():
        ...

    However, in Python 2 Ellipsis can't be written as .... The only exception is a[...] that means a[Ellpsis].

    All of the following syntaxes are valid for Python 3, but only the first line is valid for Python 2:

    a[...]
    a[...:2:...]
    [..., ...]
    {...:...}
    a = ...
    ... is ...
    def a(x=...): ...

    Modules reimporting


    Already imported modules will not be loaded again. import foo just does nothing. However, it proved to be useful to reimport modules while working in an interactive environment. The proper way to do this in Python 3.4+ is to use importlib:

    In [1]: import importlib
    In [2]: with open('foo.py', 'w') as f:
       ...:     f.write('a = 1')
       ...:
    
    In [3]: import foo
    In [4]: foo.a
    Out[4]: 1
    In [5]: with open('foo.py', 'w') as f:
       ...:     f.write('a = 2')
       ...:
    In [6]: foo.a
    Out[6]: 1
    In [7]: import foo
    In [8]: foo.a
    Out[8]: 1
    In [9]: importlib.reload(foo)
    Out[9]: <module 'foo' from '/home/v.pushtaev/foo.py'>
    In [10]: foo.a
    Out[10]: 2

    ipython also has the autoreload extension that automatically reimports modules if necessary:

    In [1]: %load_ext autoreload
    In [2]: %autoreload 2
    In [3]: with open('foo.py', 'w') as f:
       ...:     f.write('print("LOADED"); a=1')
       ...:
    In [4]: import foo
    LOADED
    In [5]: foo.a
    Out[5]: 1
    In [6]: with open('foo.py', 'w') as f:
       ...:     f.write('print("LOADED"); a=2')
       ...:
    In [7]: import foo
    LOADED
    In [8]: foo.a
    Out[8]: 2
    In [9]: with open('foo.py', 'w') as f:
       ...:     f.write('print("LOADED"); a=3')
       ...:
    In [10]: foo.a
    LOADED
    Out[10]: 3

    \G


    In some languages, you can use \G assertion. It matches at the position where the previous match is ended. That allows writing finite automata that walk through string word by word (where word is defined by the regex).

    However, there is no such thing in Python. The proper workaround is to manually track the position and pass the substring to regex functions:

    import re
    import json
    
    text = '<a><b>foo</b><c>bar</c></a><z>bar</z>'
    regex = '^(?:<([a-z]+)>|</([a-z]+)>|([a-z]+))'
    
    stack = []
    tree = []
    
    pos = 0
    while len(text) > pos:
        error = f'Error at {text[pos:]}'
        found = re.search(regex, text[pos:])
        assert found, error
        pos += len(found[0])
        start, stop, data = found.groups()
    
        if start:
            tree.append(dict(
                tag=start,
                children=[],
            ))
            stack.append(tree)
            tree = tree[-1]['children']
        elif stop:
            tree = stack.pop()
            assert tree[-1]['tag'] == stop, error
            if not tree[-1]['children']:
                tree[-1].pop('children')
        elif data:
            stack[-1][-1]['data'] = data
    
    
    print(json.dumps(tree, indent=4))

    In the previous example, we can save some time by avoiding slicing the string again and again but asking the re module to search starting from a different position instead.

    That requires some changes. First, re.search doesn' support searching from a custom position, so we have to compile the regular expression manually. Second, ^ means the real start for the string, not the position where the search started, so we have to manually check that the match happened at the same position.

    import re
    import json
    
    
    text = '<a><b>foo</b><c>bar</c></a><z>bar</z>' * 10
    
    
    def print_tree(tree):
       print(json.dumps(tree, indent=4))
    
    
    def xml_to_tree_slow(text):
       regex = '^(?:<([a-z]+)>|</([a-z]+)>|([a-z]+))'
    
       stack = []
       tree = []
    
       pos = 0
       while len(text) > pos:
           error = f'Error at {text[pos:]}'
           found = re.search(regex, text[pos:])
           assert found, error
           pos += len(found[0])
           start, stop, data = found.groups()
    
           if start:
               tree.append(dict(
                   tag=start,
                   children=[],
               ))
               stack.append(tree)
               tree = tree[-1]['children']
           elif stop:
               tree = stack.pop()
               assert tree[-1]['tag'] == stop, error
               if not tree[-1]['children']:
                   tree[-1].pop('children')
           elif data:
               stack[-1][-1]['data'] = data
    
    
    def xml_to_tree_slow(text):
       regex = '^(?:<([a-z]+)>|</([a-z]+)>|([a-z]+))'
    
       stack = []
       tree = []
    
       pos = 0
       while len(text) > pos:
           error = f'Error at {text[pos:]}'
           found = re.search(regex, text[pos:])
           assert found, error
           pos += len(found[0])
           start, stop, data = found.groups()
    
           if start:
               tree.append(dict(
                   tag=start,
                   children=[],
               ))
               stack.append(tree)
               tree = tree[-1]['children']
           elif stop:
               tree = stack.pop()
               assert tree[-1]['tag'] == stop, error
               if not tree[-1]['children']:
                   tree[-1].pop('children')
           elif data:
               stack[-1][-1]['data'] = data
    
       return tree
    
    _regex = re.compile('(?:<([a-z]+)>|</([a-z]+)>|([a-z]+))')
    def _error_message(text, pos):
       return text[pos:]
      
    def xml_to_tree_fast(text):
    
       stack = []
       tree = []
    
       pos = 0
       while len(text) > pos:
           error = f'Error at {text[pos:]}'
           found = _regex.search(text, pos=pos)
           begin, end = found.span(0)
           assert begin == pos, _error_message(text, pos)
           assert found, _error_message(text, pos)
           pos += len(found[0])
           start, stop, data = found.groups()
    
           if start:
               tree.append(dict(
                   tag=start,
                   children=[],
               ))
               stack.append(tree)
               tree = tree[-1]['children']
           elif stop:
               tree = stack.pop()
               assert tree[-1]['tag'] == stop, _error_message(text, pos)
               if not tree[-1]['children']:
                   tree[-1].pop('children')
           elif data:
               stack[-1][-1]['data'] = data
    
       return tree
    
    print_tree(xml_to_tree_fast(text))

    Result:

    In [1]: from example import *
    
    In [2]: %timeit xml_to_tree_slow(text)
    356 µs ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    In [3]: %timeit xml_to_tree_fast(text)
    294 µs ± 6.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

    Round function


    Today's post is written by orsinium, the author of @itgram_channel.

    The round function rounds a number to a given precision in decimal digits.

    >>> round(1.2)
    1
    >>> round(1.8)
    2
    >>> round(1.228, 1)
    1.2

    Also you can set up negative precision:

    >>> round(413.77, -1)
    410.0
    >>> round(413.77, -2)
    400.0

    round returns value of type of input number:

    >>> type(round(2, 1))
    <class 'int'>
    
    >>> type(round(2.0, 1))
    <class 'float'>
    
    >>> type(round(Decimal(2), 1))
    <class 'decimal.Decimal'>
    
    >>> type(round(Fraction(2), 1))
    <class 'fractions.Fraction'>

    For your own classes you can define round processing with the __round__ method:

    >>> class Number(int):
    ...   def __round__(self, p=-1000):
    ...     return p
    ...
    >>> round(Number(2))
    -1000
    >>> round(Number(2), -2)
    -2

    Values are rounded to the closest multiple of 10 ** (-precision). For example, for precision=1 value will be rounded to multiple of 0.1: round(0.63, 1) returns 0.6. If two multiples are equally close, rounding is done toward the even choice:

    >>> round(0.5)
    0
    >>> round(1.5)
    2

    Sometimes rounding of floats can be a little bit surprising:

    >>> round(2.85, 1)
    2.9

    This is because most decimal fractions can't be represented exactly as a float (https://docs.python.org/3.7/tutorial/floatingpoint.html):

    >>> format(2.85, '.64f')
    '2.8500000000000000888178419700125232338905334472656250000000000000'

    If you want to round half up you can use decimal.Decimal:

    >>> from decimal import Decimal, ROUND_HALF_UP
    >>> Decimal(1.5).quantize(0, ROUND_HALF_UP)
    Decimal('2')
    >>> Decimal(2.85).quantize(Decimal('1.0'), ROUND_HALF_UP)
    Decimal('2.9')
    >>> Decimal(2.84).quantize(Decimal('1.0'), ROUND_HALF_UP)
    Decimal('2.8')
    Mail.ru Group
    1382.01
    Строим Интернет
    Share post

    Comments 0

    Only users with full accounts can post comments. Log in, please.