중간 변수를 사용하는 코드가없는 코드보다 빠른 이유는 무엇입니까?

IT박스

중간 변수를 사용하는 코드가없는 코드보다 빠른 이유는 무엇입니까?

itboxs 2020. 10. 17. 10:07

중간 변수를 사용하는 코드가없는 코드보다 빠른 이유는 무엇입니까?

나는이 이상한 행동에 직면했고 그것을 설명하지 못했습니다. 다음은 벤치 마크입니다.

py -3 -m timeit "tuple(range(2000)) == tuple(range(2000))"
10000 loops, best of 3: 97.7 usec per loop
py -3 -m timeit "a = tuple(range(2000));  b = tuple(range(2000)); a==b"
10000 loops, best of 3: 70.7 usec per loop

임시 변수가있는 단일 라이너를 사용하는 것보다 변수 할당 비교가 27 % 이상 빠른 이유는 무엇입니까?

Python 문서에 따르면 시간 동안 가비지 수집이 비활성화되므로 그렇게 할 수 없습니다. 일종의 최적화입니까?

그 결과는 Python 2.x에서도 재현 할 수 있습니다.

Windows 7, CPython 3.5.1, Intel i7 3.40GHz, 64 비트 OS 및 Python 실행. Python 3.5.0을 사용하여 Intel i7 3.60GHz에서 실행하려고 시도한 다른 컴퓨터가 결과를 재현하지 못하는 것 같습니다.

timeit.timeit()@ 10000 루프로 동일한 Python 프로세스를 사용하여 실행하면 각각 0.703 및 0.804가 생성되었습니다. 덜하지만 여전히 보여줍니다. (~ 12.5 %)

내 결과는 당신의 결과와 비슷했습니다. 중간 변수를 사용하는 코드는 Python 3.4에서 적어도 10-20 % 더 빠릅니다. 그러나 동일한 Python 3.4 인터프리터에서 IPython을 사용했을 때 다음과 같은 결과를 얻었습니다.

In [1]: %timeit -n10000 -r20 tuple(range(2000)) == tuple(range(2000))
10000 loops, best of 20: 74.2 µs per loop

In [2]: %timeit -n10000 -r20 a = tuple(range(2000));  b = tuple(range(2000)); a==b
10000 loops, best of 20: 75.7 µs per loop

특히 -mtimeit명령 줄에서 사용했을 때 전자의 경우 74.2µs에 가까워지지 않았습니다 .

그래서이 Heisenbug는 꽤 흥미로운 것으로 밝혀졌습니다. 나는 명령을 실행하기로 결정 strace했고 실제로 뭔가 수상한 일이 있습니다.

% strace -o withoutvars python3 -m timeit "tuple(range(2000)) == tuple(range(2000))"
10000 loops, best of 3: 134 usec per loop
% strace -o withvars python3 -mtimeit "a = tuple(range(2000));  b = tuple(range(2000)); a==b"
10000 loops, best of 3: 75.8 usec per loop
% grep mmap withvars|wc -l
46
% grep mmap withoutvars|wc -l
41149

이것이 차이에 대한 좋은 이유입니다. 변수를 사용하지 않는 코드 mmap는 중간 변수를 사용 하는 코드 보다 거의 1000 배 더 많이 시스템 호출을 호출합니다.

는 withoutvars이 가득 mmap/ munmap256K 지역에 대한; 이 같은 줄이 계속해서 반복됩니다.

mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f32e56de000
munmap(0x7f32e56de000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f32e56de000
munmap(0x7f32e56de000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f32e56de000
munmap(0x7f32e56de000, 262144)          = 0

mmap호출은 함수에서 오는 것 같다 _PyObject_ArenaMmap에서 Objects/obmalloc.c; 는 obmalloc.c또한 매크로 포함 ARENA_SIZE된다 #define할 거라고를 (256 << 10)(즉 262144); 유사은 munmap일치 _PyObject_ArenaMunmap에서 obmalloc.c.

obmalloc.c 말한다

Python 2.5 이전에는 경기장이 free()'ed' 되지 않았습니다 . Python 2.5부터는 free()아레나를 시도하고 약간의 휴리스틱 전략을 사용하여 아레나가 결국 해제 될 가능성을 높입니다.

따라서 이러한 휴리스틱과 Python 객체 할당 python3 -mtimeit 'tuple(range(2000)) == tuple(range(2000))'자가 비어있는 즉시 이러한 무료 아레나를 해제한다는 사실은 하나의 256kiB 메모리 영역이 재 할당되고 반복적으로 해제되는 병리학 적 동작 을 유발합니다. 이 할당에 발생 mmap/ munmap, 또한 -, 그들이있는 거 시스템 호출로 비교적 비용이 많이 드는 인 mmap으로 MAP_ANONYMOUS새로 매핑 페이지를 제로해야한다는 요구 - 파이썬 상관하지 않을지라도.

이 동작은 중간 변수를 사용하는 코드에는 존재하지 않습니다. 약간 더 많은 메모리를 사용 하고 일부 개체가 여전히 할당되어 있으므로 메모리 영역을 해제 할 수 없기 때문입니다. 그것은 timeit다르지 않은 루프로 만들 것이기 때문 입니다.

for n in range(10000)
    a = tuple(range(2000))
    b = tuple(range(2000))
    a == b

Now the behaviour is that both a and b will stay bound until they're *reassigned, so in the second iteration, tuple(range(2000)) will allocate a 3rd tuple, and the assignment a = tuple(...) will decrease the reference count of the old tuple, causing it to be released, and increase the reference count of the new tuple; then the same happens to b. Therefore after the first iteration there are always at least 2 of these tuples, if not 3, so the thrashing doesn't occur.

Most notably it cannot be guaranteed that the code using intermediate variables is always faster - indeed in some setups it might be that using intermediate variables will result in extra mmap calls, whereas the code that compares return values directly might be fine.

Someone asked that why this happens, when timeit disables garbage collection. It is indeed true that timeit does it:

Note

By default, timeit() temporarily turns off garbage collection during the timing. The advantage of this approach is that it makes independent timings more comparable. This disadvantage is that GC may be an important component of the performance of the function being measured. If so, GC can be re-enabled as the first statement in the setup string. For example:

However, the garbage collector of Python is only there to reclaim cyclic garbage, i.e. collections of objects whose references form cycles. It is not the case here; instead these objects are freed immediately when the reference count drops to zero.

The first question here has to be, is it reproducable? For some of us at least it definitely is though other people say they aren't seeing the effect. This on Fedora, with the equality test changed to is as actually doing a comparison seems irrelevant to the result, and the range pushed up to 200,000 as that seems to maximise the effect:

$ python3 -m timeit "a = tuple(range(200000));  b = tuple(range(200000)); a is b"
100 loops, best of 3: 7.03 msec per loop
$ python3 -m timeit "a = tuple(range(200000)) is tuple(range(200000))"
100 loops, best of 3: 10.2 msec per loop
$ python3 -m timeit "tuple(range(200000)) is tuple(range(200000))"
100 loops, best of 3: 10.2 msec per loop
$ python3 -m timeit "a = b = tuple(range(200000)) is tuple(range(200000))"
100 loops, best of 3: 9.99 msec per loop
$ python3 -m timeit "a = b = tuple(range(200000)) is tuple(range(200000))"
100 loops, best of 3: 10.2 msec per loop
$ python3 -m timeit "tuple(range(200000)) is tuple(range(200000))"
100 loops, best of 3: 10.1 msec per loop
$ python3 -m timeit "a = tuple(range(200000));  b = tuple(range(200000)); a is b"
100 loops, best of 3: 7 msec per loop
$ python3 -m timeit "a = tuple(range(200000));  b = tuple(range(200000)); a is b"
100 loops, best of 3: 7.02 msec per loop

I note that variations between the runs, and the order in which the expressions are run make very little difference to the result.

Adding assignments to a and b into the slow version doesn't speed it up. In fact as we might expect assigning to local variables has negligible effect. The only thing that does speed it up is splitting the expression entirely in two. The only difference this should be making is that it reduces the maximum stack depth used by Python while evaluating the expression (from 4 to 3).

That gives us the clue that the effect is related to stack depth, perhaps the extra level pushes the stack across into another memory page. If so we should see that making other changes that affect the stack will change (most likely kill the effect), and in fact that is what we see:

$ python3 -m timeit -s "def foo():
   tuple(range(200000)) is tuple(range(200000))" "foo()"
100 loops, best of 3: 10 msec per loop
$ python3 -m timeit -s "def foo():
   tuple(range(200000)) is tuple(range(200000))" "foo()"
100 loops, best of 3: 10 msec per loop
$ python3 -m timeit -s "def foo():
   a = tuple(range(200000));  b = tuple(range(200000)); a is b" "foo()"
100 loops, best of 3: 9.97 msec per loop
$ python3 -m timeit -s "def foo():
   a = tuple(range(200000));  b = tuple(range(200000)); a is b" "foo()"
100 loops, best of 3: 10 msec per loop

So, I think the effect is entirely due to how much Python stack is consumed during the timing process. It is still weird though.

참고URL : https://stackoverflow.com/questions/36548518/why-is-code-using-intermediate-variables-faster-than-code-without

'IT박스' 카테고리의 다른 글

벡터에서 항목의 인덱스를 어떻게 찾습니까? (0)	2020.10.18
Spring에서 필터에 던져진 예외를 관리하는 방법은 무엇입니까? (0)	2020.10.17
C ++ 열거 형의 최대 값 및 최소값 (0)	2020.10.17
Wix는 모든 사용자 / 컴퓨터 당 광고되지 않은 바로 가기 생성 (0)	2020.10.17
Eclipse에서 Android Studio 프로젝트를 어떻게 여나요? (0)	2020.10.17

현재글중간 변수를 사용하는 코드가없는 코드보다 빠른 이유는 무엇입니까?

itboxs

중간 변수를 사용하는 코드가없는 코드보다 빠른 이유는 무엇입니까?

중간 변수를 사용하는 코드가없는 코드보다 빠른 이유는 무엇입니까?

'IT박스' 카테고리의 다른 글

'IT박스'의 다른글

티스토리툴바

중간 변수를 사용하는 코드가없는 코드보다 빠른 이유는 무엇입니까?

중간 변수를 사용하는 코드가없는 코드보다 빠른 이유는 무엇입니까?

'IT박스' 카테고리의 다른 글

'IT박스'의 다른글

관련글

티스토리툴바