Python 3, List vs Deque, When to use and why

Post date: Jun 25, 2017 6:54:55 AM

This post is continuation the the post covering Python 3.6 new features among other things.

PyClockPro test, 10000 lookups, with 1000 cache records. Using different distributions. But let's see the conclusions.

Full test runs using Py Clock-Pro Benchmark.

Using list:

Test set generation time: 1.83
M: R T:   0 #: 1 LT: 0.04 LH:   0.0 PT: 1.15 PH:   0.0 DT: 2851%  DH: 100.0%
M: R T:  25 #: 1 LT: 0.04 LH:  25.0 PT: 5.42 PH:  24.9 DT: 12061% DH:  99.8%
M: R T:  50 #: 1 LT: 0.04 LH:  49.9 PT: 2.54 PH:  50.0 DT: 6264%  DH: 100.1%
M: R T:  75 #: 1 LT: 0.03 LH:  75.1 PT: 0.94 PH:  75.4 DT: 2737%  DH: 100.4%
M: R T: 100 #: 1 LT: 0.02 LH: 100.0 PT: 0.20 PH: 100.0 DT: 840%   DH: 100.0%
M: G T:   0 #: 1 LT: 0.04 LH:   0.0 PT: 1.15 PH:   0.0 DT: 2836%  DH: 100.0%
M: G T:  25 #: 1 LT: 0.05 LH:  25.0 PT: 5.20 PH:  27.2 DT: 11284% DH: 108.5%
M: G T:  50 #: 1 LT: 0.04 LH:  50.0 PT: 2.49 PH:  52.9 DT: 5880%  DH: 105.7%
M: G T:  75 #: 1 LT: 0.03 LH:  75.1 PT: 0.86 PH:  77.6 DT: 2533%  DH: 103.3%
M: G T: 100 #: 1 LT: 0.02 LH: 100.0 PT: 0.20 PH: 100.0 DT: 840%   DH: 100.0%
M: P T:   0 #: 1 LT: 0.04 LH:   0.0 PT: 1.14 PH:   0.0 DT: 2852%  DH: 100.0%
M: P T:  25 #: 1 LT: 0.04 LH:  25.0 PT: 3.42 PH:  33.1 DT: 7809%  DH: 132.5%
M: P T:  50 #: 1 LT: 0.04 LH:  50.0 PT: 2.55 PH:  55.4 DT: 6299%  DH: 110.8%
M: P T:  75 #: 1 LT: 0.04 LH:  75.0 PT: 0.87 PH:  77.0 DT: 2414%  DH: 102.6%
M: P T: 100 #: 1 LT: 0.02 LH: 100.0 PT: 0.20 PH: 100.0 DT: 833%   DH: 100.0%
LRU time:  0.55   LRU hitr: 50.01 %
PCP time: 28.33   PCP hitr: 51.56 %
Dif time: 4555.7 % Dif hitr: 104.2 %

Using deque:

Test set generation time: 1.97
M: R T:   0 #: 1 LT: 0.04 LH:   0.0 PT: 1.11 PH:   0.0 DT: 2767%  DH: 100.0%
M: R T:  25 #: 1 LT: 0.04 LH:  25.0 PT: 5.29 PH:  25.3 DT: 12578% DH: 101.4%
M: R T:  50 #: 1 LT: 0.04 LH:  50.1 PT: 2.66 PH:  50.1 DT: 7080%  DH:  99.9%
M: R T:  75 #: 1 LT: 0.04 LH:  75.0 PT: 0.99 PH:  74.9 DT: 2827%  DH:  99.8%
M: R T: 100 #: 1 LT: 0.02 LH: 100.0 PT: 0.19 PH: 100.0 DT: 804%   DH: 100.0%
M: G T:   0 #: 1 LT: 0.04 LH:   0.0 PT: 1.13 PH:   0.0 DT: 2747%  DH: 100.0%
M: G T:  25 #: 1 LT: 0.04 LH:  25.1 PT: 5.25 PH:  27.2 DT: 11830% DH: 108.5%
M: G T:  50 #: 1 LT: 0.04 LH:  50.0 PT: 2.65 PH:  52.5 DT: 6685%  DH: 105.0%
M: G T:  75 #: 1 LT: 0.04 LH:  75.1 PT: 0.92 PH:  77.1 DT: 2475%  DH: 102.8%
M: G T: 100 #: 1 LT: 0.02 LH: 100.0 PT: 0.19 PH: 100.0 DT: 800%   DH: 100.0%
M: P T:   0 #: 1 LT: 0.04 LH:   0.0 PT: 1.11 PH:   0.0 DT: 2711%  DH: 100.0%
M: P T:  25 #: 1 LT: 0.04 LH:  25.0 PT: 3.32 PH:  33.1 DT: 7554%  DH: 132.6%
M: P T:  50 #: 1 LT: 0.04 LH:  50.1 PT: 2.66 PH:  55.3 DT: 6893%  DH: 110.3%
M: P T:  75 #: 1 LT: 0.03 LH:  75.1 PT: 0.83 PH:  77.5 DT: 2447%  DH: 103.1%
M: P T: 100 #: 1 LT: 0.02 LH: 100.0 PT: 0.19 PH: 100.0 DT: 797%   DH: 100.0%
LRU time:  0.55   LRU hitr: 50.03 %
PCP time: 28.48   PCP hitr: 51.53 %
Dif time: 4733.1 % Dif hitr: 104.2 %

Thoughts:

If we just compare the Pareto 25% (lru) hit ratio set, let's see what the differences are.

With deque:

M: P T:  25 #: 1 LT: 0.04 LH:  25.0 PT: 3.42 PH:  33.1 DT: 7809% DH: 132.5%

With list:

M: P T:  25 #: 1 LT: 0.04 LH:  25.0 PT: 3.32 PH:  33.1 DT: 7554% DH: 132.6%

Using 1000 cache entries, it seems that list is still sightly faster than deque.

This once again shows how hard it's to make some choices. Because in some cases list is much faster and in some other cases deque is much faster. After a few tests I couldn't actually conclude if deque or list would be faster for cache. Because deque is faster for insert and pop, in random order. It's still much slower for generic access.

What's the root cause for this dilemma? It seems that deque access by index is very slow, because it's going through linked list. Instead of having direct memory access using pointer math in single array.

In this case list is much faster than deque, random index access:

list[512*1024]
deque[512*1024]

And in this case deque is much faster than list, basically same as popleft:

del list[0]
del qeque[0]

The differences just grow, if the data set size grows. It's got everything to do with the underlying data structure and implementation. And this is the root cause why deque isn't actually faster than list. Because most of PyClockPro access hits the list with random index number to be accessed.

Why? Deque is linked list, and 'walking the linked list', is slow. On traditional compact memory list you can basically just access index pointer at certain location to find Nth entry.