我一直在考虑如何将deque(即双端队列)实现为不可变数据结构.
似乎有不同的方法来做到这一点.AFAIK,不可变数据结构通常是分层的,因此在修改诸如插入或删除项之类的操作之后,可以重用它的主要部分.
Eric Lippert 在他的博客上有两篇 关于这个主题的文章,以及C#中的示例实现.
他的两个实现都让我觉得比实际需要更精细.难道deques只能被实现为二叉树,其中元素只能在树的"左"(前)和非"右"(后面)插入或删除?
o / \ … … / \ … … / \ / \ front --> L … … R <-- back
此外,树将与旋转保持合理平衡:
在插入前部或从背部移除时右旋转,和
从前面移开或在后面插入时左旋转.
在我看来,Eric Lippert是一个非常聪明的人,我非常尊重他,但他显然没有考虑这种方法.因此我想知道,这是有充分理由的吗?我建议的实施dequesnaïve的方法吗?
As Daniel noted, implementing immutable deques with well-known balanced search trees like AVL or red-black trees gives Θ(lg n) worst case complexity. Some of the implementations Lippert discusses may seem elaborate at first glance, but there are many immutable deques with o(lg n) worst or average or amortized complexity that are built from balanced trees along with two simple ideas:
Reverse the Spines
要在传统的平衡搜索树上执行双端队列操作,我们需要访问端点,但我们只能访问中心.为了到达左端,我们必须导航左子指针直到我们最终达到死胡同.最好有一个指向左端和右端的指针而没有所有的导航工作.实际上,我们确实不需要经常访问根节点.让我们存储一个平衡的搜索树,以便访问末尾是O(1).
以下是C中通常如何存储AVL树的示例:
struct AVLTree { const char * value; int height; struct AVLTree * leftChild; struct AVLTree * rightChild; };
To set up the tree so that we can start at the edges and move towards the root, we change the tree and store all of the pointers along the paths from the root to the left and rightmost children in reverse. (These paths are called the left and right spine, respectively). Just like reversing a singly-linked list, the last element becomes the first, so the leftmost child is now easily accessible.
This is a little tricky to understand. To help explain it, imagine that you only did this for the left spine:
struct LeftSpine { const char * value; int height; struct AVLTree * rightChild; struct LeftSpine * parent; };
In some sense, the leftmost child is now the "root" of the tree. If you drew the tree this way, it would look very strange, but if you simply take your normal drawing of a tree and reverse all of the arrows on the left spine, the meaning of the LeftSpine struct should become clearer. Access to the left side of the tree is now immediate. The same can be done for the right spine:
struct RightSpine { double value; int height; struct AVLTree * leftChild; struct RightSpine * parent; };
If you store both a left and a right spine as well as the center element, you have immediate access to both ends. Inserting and deleting may still be Ω(lg n), because rebalancing operations may require traversing the entire left or right spine, but simply viewing to the left and rightmost elements is now O(1).
此策略的一个示例用于使用SML和Java(更多文档)实现纯粹的功能性treap.这也是其他几个具有o(lg n)性能的不可变deques的关键思想.
限制了平衡工作
如上所述,在AVL树的左端或最右端插入可能需要Ω(lg n)时间来遍历脊柱.以下是演示此内容的AVL树的示例:
完整二叉树通过归纳定义为:
高度为0的完整二叉树是空节点.
高度为n + 1的完整二叉树具有根节点,其具有高度为n的完整二叉树作为子节点.
Pushing an element onto the left of a full binary tree will necessarily increase the maximum height of the tree. Since the AVL trees above store that information in each node, and since every tree along the left spine of a full binary tree is also a full binary tree, pushing an element onto the left of an AVL deque that happens to be a full binary tree will require incrementing Ω(lg n) height values along the left spine.
(Two notes on this: (a) You can store AVL trees without keeping the height in the node; instead you keep only balance information (left-taller, right-taller, or even). This doesn't change the performance of the above example. (b) In AVL trees, you might need to do not only Ω(lg n) balance or height information updates, but Ω(lg n) rebalancing operations. I don't recall the details of this, and it may be only on deletions, rather than insertions.)
In order to achieve o(lg n) deque operations, we need to limit this work. Immutable deques represented by balanced trees usually use at least one of the following strategies:
Anticipate where rebalancing will be needed. If you are using a tree that requires o(lg n) rebalancing but you know where that rebalancing will be needed and you can get there quickly enough, you can perform your deque operations in o(lg n) time. Deques that use this as a strategy will store not just two pointers into the deque (the ends of the left and right spines, as discussed above), but some small number of jump pointers to places higher along the spines. Deque operations can then access the roots of the trees pointed to by the jump pointers in O(1) time. If o(lg n) jump pointers are maintained for all of the places where rebalancing (or changing node information) will be needed, deque operations can take o(lg n) time.
(Of course, this makes the tree actually a dag, since the trees on the spines pointed to by jump pointers are also pointed to by their children on the spine. Immutable data structures don't usually get along with non-tree graphs, since replacing a node pointed to by more than one other node requires replacing all of the other nodes that point to it. I have seen this fixed by just eliminating the non-jump pointers, turning the dag back into a tree. One can then store a singly-linked list with jump pointers as a list of lists. Each subordinate list contains all of the nodes between the head of that list and its jump pointer. This requires some care to deal with partially overlapping jump pointers, and a full explanation is probably not appropriate for this aside.)
这是Tsakalidis在他的论文"用于本地化搜索的AVL树"中使用的技巧之一,允许在具有平衡条件的AVL树上进行O(1)deque操作.这也是Kaplan和Tarjan在他们的论文中使用的主要思想"纯粹的功能性,带有连锁的实时deques"以及Mihaesau和Tarjan稍后的改进.Munro等人的"确定性跳过列表"在这里也值得一提,尽管通过使用树将跳过列表转换为不可变设置有时会改变允许在末端附近进行这种有效修改的属性.有关翻译的示例,请参阅Messeguer的"跳过树,在并发方法中跳过列表的替代数据结构"Dean和Jones的"探索跳过列表和二叉搜索树之间的二元性",以及Lamoureux和Nickerson的"关于B树和确定性跳过列表的等价性".
做大量的工作.在上面的完整二叉树示例中,推送时不需要重新平衡,但Ω(lg n)节点需要更新其高度或平衡信息.您可以简单地将末端的脊椎标记为需要增量,而不是实际进行增量.
理解这个过程的一种方法是类比二进制数.(2 ^ n)-1由二进制表示为n 1的字符串.向此数字添加1时,您需要将所有1更改为0,然后在结尾添加1.以下Haskell将二进制数编码为非空的位串,最不重要的是第一位.
data Bit = Zero | One type Binary = (Bit,[Bit]) incr :: Binary -> Binary incr (Zero,x) = (One,x) incr (One,[]) = (Zero,[One]) incr (One,(x:xs)) = let (y,ys) = incr (x,xs) in (Zero,y:ys)
incr是一个递归函数,对于表单的数字(One,replicate k One)
,incr称自己为Ω(k)次.
相反,我们可以仅通过组中的位数来表示相等的位组.如果相邻比特或比特组相等(值,而不是数字),则将它们组合成一个组.我们可以在O(1)时间内增加:
data Bits = Zeros Int | Ones Int type SegmentedBinary = (Bits,[Bits]) segIncr :: SegmentedBinary -> SegmentedBinary segIncr (Zeros 1,[]) = (Ones 1,[]) segIncr (Zeros 1,(Ones n:rest)) = (Ones (n+1),rest) segIncr (Zeros n,rest) = (Ones 1,Zeros (n-1):rest) segIncr (Ones n,[]) = (Zeros n,[Ones 1]) segIncr (Ones n,(Zeros 1:Ones m:rest)) = (Zeros n,Ones (m+1):rest) segIncr (Ones n,(Zeros p:rest)) = (Zeros n,Ones 1:Zeros (p-1):rest)
由于segIncr不是递归的,并且不会在Ints上调用加号和减号以外的任何函数,因此您可以看到它需要O(1)时间.
Some of the deques mentioned in the section above entitled "Anticipate where rebalancing will be needed" actually use a different numerically-inspired technique called "redundant number systems" to limit the rebalancing work to O(1) and locate it quickly. Redundant numerical representations are fascinating, but possibly too far afield for this discussion. Elmasry et al.'s "Strictly-regular number system and data structures" is not a bad place to start reading about that topic. Hinze's "Bootstrapping one-sided flexible arrays" may also be useful.
In "Making data structures persistent", Driscoll et al. describe lazy recoloring, which they attribute to Tsakalidis. They apply it to red-black trees, which can be rebalanced after insertion or deletion with O(1) rotations (but Ω(lg n) recolorings) (see Tarjan's "Updataing a balanced tree in O(1) rotations"). The core of the idea is to mark a large path of nodes that need to be recolored but not rotated. A similar idea is used on AVL trees in the older versions of Brown & Tarjan's "A fast merging algorithm". (Newer versions of the same work use 2-3 trees; I have not read the newer ones and I do not know if they use any techniques like lazy recoloring.)
Randomize. Treaps, mentioned above, can be implemented in a functional setting so that they perform deque operations on O(1) time on average. Since deques do not need to inspect their elements, this average is not susceptible to malicious input degrading performance, unlike simple (no rebalancing) binary search trees, which are fast on average input. Treaps use an independent source of random bits instead of relying on randomness from the data.
In a persistent setting, treaps may be susceptible to degraded performance from malicious input with an adversary who can both (a) use old versions of a data structure and (b) measure the performance of operations. Because they do not have any worst-case balance guarantees, treaps can become quite unbalanced, though this should happen rarely. If an adversary waits for a deque operation that takes a long time, she can initiate that same operation repeatedly in order to measure and take advantage of a possibly unbalanced tree.
If this is not a concern, treaps are an attractively simple data structure. They are very close to the AVL spine tree described above.
Skip lists, mentioned above, might also be amenable to functional implementations with O(1) average-time deque operations.
The first two techniques for bounding the rebalancing work require complex modifications to data structures while usually affording a simple analysis of the complexity of deque operations. Randomization, along with the next technique, have simpler data structures but more complex analysis. The original analysis by Seidel and Aragon is not trivial, and there is some complex analysis of exact probabilities using more advanced mathematics than is present in the papers cited above -- see Flajolet et al.'s "Patterns in random binary search trees".
Amortize. There are several balanced trees that, when viewed from the roots up (as explained in "Reverse the Spines", above), offer O(1) amortized insertion and deletion time. Individual operations can take Ω(lg n) time, but they put the tree in such a nice state that a large number of operations following the expensive operation will be cheap.
Unfortunately, this kind of analysis does not work when old versions of the tree are still around. A user can perform operations on the old, nearly-out-of-balance tree many times without any intervening cheap operations.
One way to get amortized bounds in a persistent setting was invented by Chris Okasaki. It is not simple to explain how the amortization survives the ability to use arbitrary old versions of a data structure, but if I remember correctly, Okasaki's first (as far as I know) paper on the subject has a pretty clear explanation. For more comprehensive explanations, see his thesis or his book.
As I understand it, there are two essential ingredients. First, instead of just guaranteeing that a certain number of cheap operations occur before each expensive operation (the usual approach to amortization) you actually designate and set up that specific expensive operation before performing the cheap operations that will pay for it. In some cases, the operation is scheduled to be started (and finished) only after many intervening cheap steps. In other cases, the operation is actually scheduled only O(1) steps in the future, but cheap operations may do part of the expensive operation and then reschedule more of it for later. If an adversary looking to repeat an expensive operation over and over again is actually reusing the same scheduled operation each time. This sharing is where the second ingredient comes in.
The computation is set up using laziness. A lazy value is not computed immediately, but, once performed, its result is saved. The first time a client needs to inspect a lazy value, its value is computed. Later clients can use that cached value directly, without having to recompute it.
#includestruct lazy { int (*oper)(const char *); char * arg; int* ans; }; typedef struct lazy * lazyop; lazyop suspend(int (*oper)(const char *), char * arg) { lazyop ans = (lazyop)malloc(sizeof(struct lazy)); ans->oper = oper; ans->arg = arg; return ans; } void force(lazyop susp) { if (0 == susp) return; if (0 != susp->ans) return; susp->ans = (int*)malloc(sizeof(int)); *susp->ans = susp->oper(susp->arg); } int get(lazyop susp) { force(susp); return *susp->ans; }
Laziness constructs are included in some MLs, and Haskell is lazy by default. Under the hood, laziness is a mutation, which leads some authors to call it a "side effect". That might be considered bad if that kind of side effect doesn't play well with whatever the reasons were for selecting an immutable data structure in the first place, but, on the other hand, thinking of laziness as a side effect allows the application of traditional amortized analysis techniques to persistent data structures, as mentioned in a paper by Kaplan, Okasaki, and Tarjan entitled "Simple Confluently Persistent Catenable Lists".
Consider again the adversary from above who is attempting to repeatedly force the computation of an expensive operation. After the first force of the lazy value, every remaining force is cheap.
In his book, Okasaki explains how to build deques with O(1) amortized time required for each operation. It is essentially a B+-tree, which is a tree where all of the elements are stored at the leaves, nodes may vary in how many children they have, and every leaf is at the same depth. Okasaki uses the spine-reversal method discussed above, and he suspends (that is, stores as a lazy value) the spines above the leaf elements.
A structure by Hinze and Paterson called "Finger trees: a simple general-purpose data structure" is halfway between the deques designed by Okasaki and the "Purely functional representations of catenable sorted lists" of Kaplan and Tarjan. Hinze and Paterson's structure has become very popular.
As a evidence of how tricky the amortized analysis is to understand, Hinze and Paterson's finger trees are frequently implemented without laziness, making the time bounds not O(1) but still O(lg n). One implementation that seems to use laziness is the one in functional-dotnet. That project also includes an implementation of lazy values in C# which might help explain them if my explanation above is lacking.
Could deques be implemented as binary trees? Yes, and their worst-case complexity when used persistently would be no worse than those presented by Eric Lippert. However, Eric's trees are actually not complicated enough to get O(1) deque operations in a persistent setting, though only by a small complexity margin (making the center lazy) if you are willing to accept amortized performance. A different but also simple view of treaps can get O(1) expected performance in a functional setting, assuming an adversary who is not too tricky. Getting O(1) worst-case deque operations with a tree-like structure in a functional setting requires a good bit more complexity than Eric's implementations.
Two final notes (though this is a very interesting topic and I reserve the right to add more later) :-)
Nearly all of the deques mentioned above are finger search trees as well. In a functional setting this means they can be split at the ith element in O(lg(min(i,n-i))) time and two trees of size n and m can be concatenated in O(lg(min(n,m))) time.
I know of two ways of implementing deques that don't use trees. Okasaki presents one in his book and thesis and the paper I linked to above. The other uses a technique called "global rebuilding" and is presented in Chuang and Goldberg's "Real-time deques, multihead Turing machines, and purely functional programming".
其他答案都很棒.我将向他们添加我选择deque的finger树实现,因为它对泛型类型系统进行了不寻常和有趣的使用.大多数数据结构在自己的递归结构,但这种技术使递归也是在类型系统,我从来没有见过的; 我认为这可能是普遍感兴趣的.