我需要维护一个非常短暂且频繁上下的连接客户名单.由于潜在的客户端数量,我需要一个支持快速插入/删除的集合.建议?
我在C#和C++中找到的最佳实现是 - 对于C#/ CLI:
http://www.itu.dk/research/c5/Release1.1/ITU-TR-2006-76.pdf
http://www.itu.dk/research/c5/
它经过深入研究,具有可扩展的单元测试,自2月起,它们还在.Net中实现了通用接口,这使得使用集合更容易.它们在Channel9上有特色,他们对这些系列进行了广泛的性能测试.
如果你正在使用数据结构,那么这些研究人员在他们的库中有一个红黑树实现,类似于你发现Lütz反射器并查看System.Data的内部结构时所发现的:p.插入复杂度:O(log(n)).
然后,如果你可以允许一些C++互操作并且你绝对需要速度并且希望尽可能少的开销,那么来自Dmitriy V'jukov的这些无锁ADT可能是你在这个世界上可以获得的最好的,超过英特尔的并发库ADT.
http://groups.google.com/group/lock-free
我已经阅读了一些代码,而且这些代码非常精通这些代码.VC++可以在没有恼人边界的情况下进行原生C++互操作.http://www.swig.org/可以帮助你包装C++接口以便在.Net中使用,或者你可以通过P/Invoke自己完成.
他们编写了教程,这篇教程在C#中实现了一个相当粗糙的跳过列表,并讨论了其他类型的数据结构.(在CodeProject上有一个更好的SkipList,它非常完善并以良好的方式实现接口.)它们还有一些与.Net捆绑的数据结构,即HashTable/Dictionary <,>和HashSet.当然还有"ResizeArray"/ List类型以及堆栈和队列,但它们在搜索时都是"线性的".
如果您希望加快内存分配所需的时间,可以使用google的perf工具.它们在谷歌代码中可用,它们包含一个非常有趣的多线程malloc实现(TCMalloc),它显示比正常malloc更一致的时序.您可以将它与上面的无锁结构一起使用,以实现对性能的疯狂.
您还可以在函数上使用memoization来通过缓存来提高性能,如果您使用的是F#,则会感兴趣.F#还允许C++互操作,所以你就可以了.
使用在bloom-filters上进行的研究也可以自己做一些事情,它允许O(k)查找复杂度,其中k是一个常数,取决于你实现的哈希函数的数量.这就是谷歌的BigTable的实施方式.这些过滤器会为你提供元素,如果它在集合中或者可能具有非常低的可能性而不是你正在寻找的元素(参见维基百科的图表 - 它接近P(wrong_key) - > 0.01作为大小是大约10000个元素,但你可以通过实现更多的哈希函数/减少集合来解决这个问题.
我没有搜索过.Net的实现,但由于散列计算是独立的,因此您可以使用MS的性能团队的Tasks实现来加快速度.
碰巧我刚做了一个涉及数据结构的课程.在这种情况下,我们使用C++,但很容易转换为C#.我们构建了三种不同的数据结构; bloom-filter,skip-list和随机二叉搜索树.
请参阅最后一段后面的代码和分析.
最后,为了使我的答案"完整",如果你真的需要速度,你可以使用路由表或内容可寻址内存.这使您可以非常快速地在原则上对O(1)进行数据的"哈希"值查找.
如果您在代码中发现错误,或者只是指示我如何更好地(或更好地使用模板),我会非常感谢您的反馈.请注意,布隆过滤器不像现实生活中那样; 通常你不必从它中删除然后它比我允许删除测试的黑客空间效率更高.
DataStructure.h
#ifndef DATASTRUCTURE_H_ #define DATASTRUCTURE_H_ class DataStructure { public: DataStructure() {countAdd=0; countDelete=0;countFind=0;} virtual ~DataStructure() {} void resetCountAdd() {countAdd=0;} void resetCountFind() {countFind=0;} void resetCountDelete() {countDelete=0;} unsigned int getCountAdd(){return countAdd;} unsigned int getCountDelete(){return countDelete;} unsigned int getCountFind(){return countFind;} protected: unsigned int countAdd; unsigned int countDelete; unsigned int countFind; }; #endif /*DATASTRUCTURE_H_*/
Key.h
#ifndef KEY_H_ #define KEY_H_ #includeusing namespace std; const int keyLength = 128; class Key : public string { public: Key():string(keyLength, ' ') {} Key(const char in[]): string(in){} Key(const string& in): string(in){} bool operator<(const string& other); bool operator>(const string& other); bool operator==(const string& other); virtual ~Key() {} }; #endif /*KEY_H_*/
Key.cpp
#include "Key.h" bool Key::operator<(const string& other) { return compare(other) < 0; }; bool Key::operator>(const string& other) { return compare(other) > 0; }; bool Key::operator==(const string& other) { return compare(other) == 0; }
BloomFilter.h
#ifndef BLOOMFILTER_H_ #define BLOOMFILTER_H_ #include#include #include #include #include "Key.h" #include "DataStructure.h" #define LONG_BIT 32 #define bitmask(val) (unsigned long)(1 << (LONG_BIT - (val % LONG_BIT) - 1)) // TODO: Implement RW-locking on the reads/writes to the bitmap. class BloomFilter : public DataStructure { public: BloomFilter(){} BloomFilter(unsigned long length){init(length);} virtual ~BloomFilter(){} void init(unsigned long length); void dump(); void add(const Key& key); void del(const Key& key); /** * Returns true if the key IS BELIEVED to exist, false if it absolutely doesn't. */ bool testExist(const Key& key, bool v = false); private: unsigned long hash1(const Key& key); unsigned long hash2(const Key& key); bool exist(const Key& key); void getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1, int& i2, const Key& key); void getCountIndicies(const int i1, const unsigned long h1, const int i2, const unsigned long h2, int& i1_c, int& i2_c); vector m_tickBook; vector m_useCounts; unsigned long m_length; // number of bits in the bloom filter unsigned long m_pockets; //the number of pockets static const unsigned long m_pocketSize; //bits in each pocket }; #endif /*BLOOMFILTER_H_*/
BloomFilter.cpp
#include "BloomFilter.h" const unsigned long BloomFilter::m_pocketSize = LONG_BIT; void BloomFilter::init(unsigned long length) { //m_length = length; m_length = (unsigned long)((2.0*length)/log(2))+1; m_pockets = (unsigned long)(ceil(double(m_length)/m_pocketSize)); m_tickBook.resize(m_pockets); // my own (allocate nr bits possible to store in the other vector) m_useCounts.resize(m_pockets * m_pocketSize); unsigned long i; for(i=0; i< m_pockets; i++) m_tickBook[i] = 0; for (i = 0; i < m_useCounts.size(); i++) m_useCounts[i] = 0; // my own } unsigned long BloomFilter::hash1(const Key& key) { unsigned long hash = 5381; unsigned int i=0; for (i=0; i< key.length(); i++){ hash = ((hash << 5) + hash) + key.c_str()[i]; /* hash * 33 + c */ } double d_hash = (double) hash; d_hash *= (0.5*(sqrt(5)-1)); d_hash -= floor(d_hash); d_hash *= (double)m_length; return (unsigned long)floor(d_hash); } unsigned long BloomFilter::hash2(const Key& key) { unsigned long hash = 0; unsigned int i=0; for (i=0; i< key.length(); i++){ hash = key.c_str()[i] + (hash << 6) + (hash << 16) - hash; } double d_hash = (double) hash; d_hash *= (0.5*(sqrt(5)-1)); d_hash -= floor(d_hash); d_hash *= (double)m_length; return (unsigned long)floor(d_hash); } bool BloomFilter::testExist(const Key& key, bool v){ if(exist(key)) { if(v) cout<<"Key "<< key<<" is in the set"<0) && ((m_tickBook[i2] & bitmask(h2)) > 0); } /* * Gets the values of the indicies for two hashes and places them in * the passed parameters. The index is into m_tickBook. */ void BloomFilter::getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1, int& i2, const Key& key) { h1 = hash1(key); h2 = hash2(key); i1 = (int) h1/m_pocketSize; i2 = (int) h2/m_pocketSize; } /* * Gets the values of the indicies into the count vector, which keeps * track of how many times a specific bit-position has been used. */ void BloomFilter::getCountIndicies(const int i1, const unsigned long h1, const int i2, const unsigned long h2, int& i1_c, int& i2_c) { i1_c = i1*m_pocketSize + h1%m_pocketSize; i2_c = i2*m_pocketSize + h2%m_pocketSize; }
**RBST.h**
#ifndef RBST_H_ #define RBST_H_ #include#include #include #include #include "Key.h" #include "DataStructure.h" #define BUG(str) printf("%s:%d FAILED SIZE INVARIANT: %s\n", __FILE__, __LINE__, str); using namespace std; class RBSTNode; class RBSTNode: public Key { public: RBSTNode(const Key& key):Key(key) { m_left =NULL; m_right = NULL; m_size = 1U; // the size of one node is 1. } virtual ~RBSTNode(){} string setKey(const Key& key){return Key(key);} RBSTNode* left(){return m_left; } RBSTNode* right(){return m_right;} RBSTNode* setLeft(RBSTNode* left) { m_left = left; return this; } RBSTNode* setRight(RBSTNode* right) { m_right =right; return this; } #ifdef DEBUG ostream& print(ostream& out) { out << "Key(" << *this << ", m_size: " << m_size << ")"; return out; } #endif unsigned int size() { return m_size; } void setSize(unsigned int val) { #ifdef DEBUG this->print(cout); cout << "::setSize(" << val << ") called." << endl; #endif if (val == 0) throw "Cannot set the size below 1, then just delete this node."; m_size = val; } void incSize() { #ifdef DEBUG this->print(cout); cout << "::incSize() called" << endl; #endif m_size++; } void decrSize() { #ifdef DEBUG this->print(cout); cout << "::decrSize() called" << endl; #endif if (m_size == 1) throw "Cannot decrement size below 1, then just delete this node."; m_size--; } #ifdef DEBUG unsigned int size(RBSTNode* x); #endif private: RBSTNode(){} RBSTNode* m_left; RBSTNode* m_right; unsigned int m_size; }; class RBST : public DataStructure { public: RBST() { m_size = 0; m_head = NULL; srand(time(0)); }; virtual ~RBST() {}; /** * Tries to add key into the tree and will return * true for a new item added * false if the key already is in the tree. * * Will also have the side-effect of printing to the console if v=true. */ bool add(const Key& key, bool v=false); /** * Same semantics as other add function, but takes a string, * but diff name, because that'll cause an ambiguity because of inheritance. */ bool addString(const string& key); /** * Deletes a key from the tree if that key is in the tree. * Will return * true for success and * false for failure. * * Will also have the side-effect of printing to the console if v=true. */ bool del(const Key& key, bool v=false); /** * Tries to find the key in the tree and will return * true if the key is in the tree and * false if the key is not. * * Will also have the side-effect of printing to the console if v=true. */ bool find(const Key& key, bool v = false); unsigned int count() { return m_size; } #ifdef DEBUG int dump(char sep = ' '); int dump(RBSTNode* target, char sep); unsigned int size(RBSTNode* x); #endif private: RBSTNode* randomAdd(RBSTNode* target, const Key& key); RBSTNode* addRoot(RBSTNode* target, const Key& key); RBSTNode* rightRotate(RBSTNode* target); RBSTNode* leftRotate(RBSTNode* target); RBSTNode* del(RBSTNode* target, const Key& key); RBSTNode* join(RBSTNode* left, RBSTNode* right); RBSTNode* find(RBSTNode* target, const Key& key); RBSTNode* m_head; unsigned int m_size; }; #endif /*RBST_H_*/
**RBST.cpp**
#include "RBST.h" bool RBST::add(const Key& key, bool v){ unsigned int oldSize = m_size; m_head = randomAdd(m_head, key); if (m_size > oldSize){ if(v) cout<<"Node "<left(), sep); cout<< *target< right(), sep); return ret; }; #endif /** * Rotates the tree around target, so that target's left * is the new root of the tree/subtree and updates the subtree sizes. * *(target) b (l) a * / \ right / \ * a ? ----> ? b * / \ / \ * ? x x ? * */ RBSTNode* RBST::rightRotate(RBSTNode* target) // private { if (target == NULL) throw "Invariant failure, target is null"; // Note: may be removed once tested. if (target->left() == NULL) throw "You cannot rotate right around a target whose left node is NULL!"; #ifdef DEBUG cout <<"Right-rotating b-node "; target->print(cout); cout << " for a-node "; target->left()->print(cout); cout << "." << endl; #endif RBSTNode* l = target->left(); int as0 = l->size(); // re-order the sizes l->setSize( l->size() + (target->right() == NULL ? 0 : target->right()->size()) + 1); // a.size += b.right.size + 1; where b.right may be null. target->setSize( target->size() -as0 + (l->right() == NULL ? 0 : l->right()->size()) ); // b.size += -a_0_size + x.size where x may be null. // swap b's left (for a) target->setLeft(l->right()); // and a's right (for b's left) l->setRight(target); #ifdef DEBUG cout << "A-node size: " << l->size() << ", b-node size: " << target->size() << "." << endl; #endif // return the new root, a. return l; }; /** * Like rightRotate, but the other way. See docs for rightRotate(RBSTNode*) */ RBSTNode* RBST::leftRotate(RBSTNode* target) { if (target == NULL) throw "Invariant failure, target is null"; if (target->right() == NULL) throw "You cannot rotate left around a target whose right node is NULL!"; #ifdef DEBUG cout <<"Left-rotating a-node "; target->print(cout); cout << " for b-node "; target->right()->print(cout); cout << "." << endl; #endif RBSTNode* r = target->right(); int bs0 = r->size(); // re-roder the sizes r->setSize(r->size() + (target->left() == NULL ? 0 : target->left()->size()) + 1); target->setSize(target->size() -bs0 + (r->left() == NULL ? 0 : r->left()->size())); // swap a's right (for b's left) target->setRight(r->left()); // swap b's left (for a) r->setLeft(target); #ifdef DEBUG cout << "Left-rotation done: a-node size: " << target->size() << ", b-node size: " << r->size() << "." << endl; #endif return r; }; // /** * Adds a key to the tree and returns the new root of the tree. * If the key already exists doesn't add anything. * Increments m_size if the key didn't already exist and hence was added. * * This function is not called from public methods, it's a helper function. */ RBSTNode* RBST::addRoot(RBSTNode* target, const Key& key) { countAdd++; if (target == NULL) return new RBSTNode(key); #ifdef DEBUG cout << "addRoot("; cout.flush(); target->print(cout) << "," << key << ") called." << endl; #endif if (*target < key) { target->setRight( addRoot(target->right(), key) ); target->incSize(); // Should I? RBSTNode* res = leftRotate(target); #ifdef DEBUG if (target->size() != size(target)) BUG("in addRoot 1"); #endif return res; } target->setLeft( addRoot(target->left(), key) ); target->incSize(); // Should I? RBSTNode* res = rightRotate(target); #ifdef DEBUG if (target->size() != size(target)) BUG("in addRoot 2"); #endif return res; }; /** * This function is called from the public add(key) function, * and returns the new root node. */ RBSTNode* RBST::randomAdd(RBSTNode* target, const Key& key) { countAdd++; if (target == NULL) { m_size++; return new RBSTNode(key); } #ifdef DEBUG cout << "randomAdd("; target->print(cout) << ", \"" << key << "\") called." << endl; #endif int r = (rand() % target->size()) + 1; // here is where we add the target as root! if (r == 1) { m_size++; // TODO: Need to lock. return addRoot(target, key); } #ifdef DEBUG printf("randomAdd recursion part, "); #endif // otherwise, continue recursing! if (*target <= key) { #ifdef DEBUG printf("target <= key\n"); #endif target->setRight( randomAdd(target->right(), key) ); target->incSize(); // TODO: Need to lock. #ifdef DEBUG if (target->right()->size() != size(target->right())) BUG("in randomAdd 1"); #endif } else { #ifdef DEBUG printf("target > key\n"); #endif target->setLeft( randomAdd(target->left(), key) ); target->incSize(); // TODO: Need to lock. #ifdef DEBUG if (target->left()->size() != size(target->left())) BUG("in randomAdd 2"); #endif } #ifdef DEBUG printf("randomAdd return part\n"); #endif m_size++; // TODO: Need to lock. return target; }; ///////////////////////////////////////////////////////////// ///////////////////// DEL FUNCTIONS //////////////////////// ///////////////////////////////////////////////////////////// /** * Deletes a node with the passed key. * Returns the root node. * Decrements m_size if something was deleted. */ RBSTNode* RBST::del(RBSTNode* target, const Key& key) { countDelete++; if (target == NULL) return NULL; #ifdef DEBUG cout << "del("; target->print(cout) << ", \"" << key << "\") called." << endl; #endif RBSTNode* ret = NULL; // found the node to delete if (*target == key) { ret = join(target->left(), target->right()); m_size--; delete target; return ret; // return the newly built joined subtree! } // store a temporary size before recursive deletion. unsigned int size = m_size; if (*target < key) target->setRight( del(target->right(), key) ); else target->setLeft( del(target->left(), key) ); // if the previous recursion changed the size, we need to decrement the size of this target too. if (m_size < size) target->decrSize(); #ifdef DEBUG if (RBST::size(target) != target->size()) BUG("in del"); #endif return target; }; /** * Joins the two subtrees represented by left and right * by randomly choosing which to make the root, weighted on the * size of the sub-tree. */ RBSTNode* RBST::join(RBSTNode* left, RBSTNode* right) { if (left == NULL) return right; if (right == NULL) return left; #ifdef DEBUG cout << "join("; left->print(cout); cout << ","; right->print(cout) << ") called." << endl; #endif // Find the chance that we use the left tree, based on its size over the total tree size. // 3 s.d. randomness :-p e.g. 60.3% chance. bool useLeft = ((rand()%1000) < (signed)((float)left->size()/(float)(left->size() + right->size()) * 1000.0)); RBSTNode* subtree = NULL; if (useLeft) { subtree = join(left->right(), right); left->setRight(subtree) ->setSize((left->left() == NULL ? 0 : left->left()->size()) + subtree->size() + 1 ); #ifdef DEBUG if (size(left) != left->size()) BUG("in join 1"); #endif return left; } subtree = join(right->left(), left); right->setLeft(subtree) ->setSize((right->right() == NULL ? 0 : right->right()->size()) + subtree->size() + 1); #ifdef DEBUG if (size(right) != right->size()) BUG("in join 2"); #endif return right; }; ///////////////////////////////////////////////////////////// ///////////////////// FIND FUNCTIONS /////////////////////// ///////////////////////////////////////////////////////////// /** * Tries to find the key in the tree starting * search from target. * * Returns NULL if it was not found. */ RBSTNode* RBST::find(RBSTNode* target, const Key& key) { countFind++; // Could use private method only counting the first call. if (target == NULL) return NULL; // not found. if (*target == key) return target; // found (does string override ==?) if (*target < key) return find(target->right(), key); // search for gt to the right. return find(target->left(), key); // search for lt to the left. }; #ifdef DEBUG unsigned int RBST::size(RBSTNode* x) { if (x == NULL) return 0; return 1 + size(x->left()) + size(x->right()); } #endif
I'll save the SkipList for another time since it's already possible to find good implementations of a SkipList from the links and my version wasn't much different.
The graphs generated from the test-file are as follows:
Graph showing time taken to add new items for BloomFilter, RBST and SkipList. graph http://haf.se/content/dl/addtimer.png
Graph showing time taken to find items for BloomFilter, RBST and SkipList graph http://haf.se/content/dl/findtimer.png
Graph showing time taken to delete items for BloomFilter, RBST and SkipList graph http://haf.se/content/dl/deltimer.png
So as you can see, the random binary search tree was rather a lot better than the SkipList. The bloom filter lives up to its O(k).
考虑基于散列的集合对于这一点,例如HashSet
,Dictionary
,HashTable
,其提供用于添加和删除元素恒定的时间性能.
.NET Framework开发人员指南中提供的更多信息:
Hashtable和字典集合类型
HashSet集合类型