我想比较两个集合(在C#中),但我不确定有效实现它的最佳方法.
我已经阅读了关于Enumerable.SequenceEqual的其他帖子,但这并不是我正在寻找的.
在我的情况下,如果它们都包含相同的项目(无论顺序),则两个集合将是相等的.
例:
collection1 = {1, 2, 3, 4}; collection2 = {2, 4, 1, 3}; collection1 == collection2; // true
我通常做的是遍历一个集合中的每个项目,看看它是否存在于另一个集合中,然后循环遍历另一个集合的每个项目,看它是否存在于第一个集合中.(我首先比较长度).
if (collection1.Count != collection2.Count) return false; // the collections are not equal foreach (Item item in collection1) { if (!collection2.Contains(item)) return false; // the collections are not equal } foreach (Item item in collection2) { if (!collection1.Contains(item)) return false; // the collections are not equal } return true; // the collections are equal
但是,这并不完全正确,并且它可能不是比较两个集合的最有效方法.
我能想到的一个例子是错误的:
collection1 = {1, 2, 3, 3, 4} collection2 = {1, 2, 2, 3, 4}
哪个与我的实施相同.我应该只计算每个项目的找到次数,并确保两个集合中的计数相等吗?
这些例子在某种C#中(让我们称之为伪C#),但是用你想要的任何语言给出你的答案,这没关系.
注意:为简单起见,我在示例中使用了整数,但我希望能够使用引用类型对象(它们作为键不能正常运行,因为只比较了对象的引用,而不是内容).
事实证明,微软已经在其测试框架中涵盖了这一点:CollectionAssert.AreEquivalent
备注
如果两个集合具有相同数量的相同元素,则它们是等效的,但是以任何顺序排列.如果元素的值相等,则元素相等,而不是它们引用相同的对象.
使用反射器,我修改了AreEquivalent()后面的代码来创建相应的相等比较器.它比现有的答案更完整,因为它考虑了空值,实现IEqualityComparer并具有一些效率和边缘案例检查.加上,这是微软 :)
public class MultiSetComparer: IEqualityComparer > { private readonly IEqualityComparer m_comparer; public MultiSetComparer(IEqualityComparer comparer = null) { m_comparer = comparer ?? EqualityComparer .Default; } public bool Equals(IEnumerable first, IEnumerable second) { if (first == null) return second == null; if (second == null) return false; if (ReferenceEquals(first, second)) return true; if (first is ICollection firstCollection && second is ICollection secondCollection) { if (firstCollection.Count != secondCollection.Count) return false; if (firstCollection.Count == 0) return true; } return !HaveMismatchedElement(first, second); } private bool HaveMismatchedElement(IEnumerable first, IEnumerable second) { int firstNullCount; int secondNullCount; var firstElementCounts = GetElementCounts(first, out firstNullCount); var secondElementCounts = GetElementCounts(second, out secondNullCount); if (firstNullCount != secondNullCount || firstElementCounts.Count != secondElementCounts.Count) return true; foreach (var kvp in firstElementCounts) { var firstElementCount = kvp.Value; int secondElementCount; secondElementCounts.TryGetValue(kvp.Key, out secondElementCount); if (firstElementCount != secondElementCount) return true; } return false; } private Dictionary GetElementCounts(IEnumerable enumerable, out int nullCount) { var dictionary = new Dictionary (m_comparer); nullCount = 0; foreach (T element in enumerable) { if (element == null) { nullCount++; } else { int num; dictionary.TryGetValue(element, out num); num++; dictionary[element] = num; } } return dictionary; } public int GetHashCode(IEnumerable enumerable) { if (enumerable == null) throw new ArgumentNullException(nameof(enumerable)); int hash = 17; foreach (T val in enumerable.OrderBy(x => x)) hash = hash * 23 + (val?.GetHashCode() ?? 42); return hash; } }
样品用法:
var set = new HashSet>(new[] {new[]{1,2,3}}, new MultiSetComparer ()); Console.WriteLine(set.Contains(new [] {3,2,1})); //true Console.WriteLine(set.Contains(new [] {1, 2, 3, 3})); //false
或者,如果您只想直接比较两个集合:
var comp = new MultiSetComparer(); Console.WriteLine(comp.Equals(new[] {"a","b","c"}, new[] {"a","c","b"})); //true Console.WriteLine(comp.Equals(new[] {"a","b","c"}, new[] {"a","b"})); //false
最后,您可以使用您选择的相等比较器:
var strcomp = new MultiSetComparer(StringComparer.OrdinalIgnoreCase); Console.WriteLine(strcomp.Equals(new[] {"a", "b"}, new []{"B", "A"})); //true
一个简单而有效的解决方案是对两个集合进行排序,然后将它们进行相等性比较:
bool equal = collection1.OrderBy(i => i).SequenceEqual( collection2.OrderBy(i => i));
该算法为O(N*logN),而上述解决方案为O(N ^ 2).
如果集合具有某些属性,您可以实现更快的解决方案.例如,如果两个集合都是哈希集,则它们不能包含重复项.此外,检查哈希集是否包含某个元素非常快.在这种情况下,类似于您的算法可能会最快.
创建一个字典"dict",然后为第一个集合中的每个成员创建dict [member] ++;
然后,以相同的方式循环遍历第二个集合,但是对于每个成员执行dict [member] - .
最后,循环遍历字典中的所有成员:
private bool SetEqual (Listleft, List right) { if (left.Count != right.Count) return false; Dictionary dict = new Dictionary (); foreach (int member in left) { if (dict.ContainsKey(member) == false) dict[member] = 1; else dict[member]++; } foreach (int member in right) { if (dict.ContainsKey(member) == false) return false; else dict[member]--; } foreach (KeyValuePair kvp in dict) { if (kvp.Value != 0) return false; } return true; }
编辑:据我所知,这与最有效的算法顺序相同.假设Dictionary使用O(1)查找,该算法为O(N).
这是我(受D.Jennings影响很大)比较方法的通用实现(在C#中):
////// Represents a service used to compare two collections for equality. /// ///The type of the items in the collections. public class CollectionComparer{ /// /// Compares the content of two collections for equality. /// /// The first collection. /// The second collection. ///True if both collections have the same content, false otherwise. public bool Execute(ICollectionfoo, ICollection bar) { // Declare a dictionary to count the occurence of the items in the collection Dictionary itemCounts = new Dictionary (); // Increase the count for each occurence of the item in the first collection foreach (T item in foo) { if (itemCounts.ContainsKey(item)) { itemCounts[item]++; } else { itemCounts[item] = 1; } } // Wrap the keys in a searchable list List keys = new List (itemCounts.Keys); // Decrease the count for each occurence of the item in the second collection foreach (T item in bar) { // Try to find a key for the item // The keys of a dictionary are compared by reference, so we have to // find the original key that is equivalent to the "item" // You may want to override ".Equals" to define what it means for // two "T" objects to be equal T key = keys.Find( delegate(T listKey) { return listKey.Equals(item); }); // Check if a key was found if(key != null) { itemCounts[key]--; } else { // There was no occurence of this item in the first collection, thus the collections are not equal return false; } } // The count of each item should be 0 if the contents of the collections are equal foreach (int value in itemCounts.Values) { if (value != 0) { return false; } } // The collections are equal return true; } }
你可以使用Hashset.查看SetEquals方法.
编辑:我意识到,一旦我提出这真的只适用于集合 - 它将无法正确处理具有重复项目的集合.例如,从该算法的角度来看,{1,1,2}和{2,2,1}将被认为是相等的.但是,如果您的集合是集合(或者它们的相等性可以通过这种方式衡量),我希望您能找到以下有用的集合.
我使用的解决方案是:
return c1.Count == c2.Count && c1.Intersect(c2).Count() == c1.Count;
Linq做了字典下的事情,所以这也是O(N).(注意,如果集合的大小不同,则为O(1)).
我使用Daniel建议的"SetEqual"方法,Igor建议的OrderBy/SequenceEquals方法以及我的建议进行了健全性检查.结果如下,显示Igor的O(N*LogN)和我和Daniel的O(N).
我认为Linq交叉代码的简单性使其成为首选解决方案.
__Test Latency(ms)__ N, SetEquals, OrderBy, Intersect 1024, 0, 0, 0 2048, 0, 0, 0 4096, 31.2468, 0, 0 8192, 62.4936, 0, 0 16384, 156.234, 15.6234, 0 32768, 312.468, 15.6234, 46.8702 65536, 640.5594, 46.8702, 31.2468 131072, 1312.3656, 93.7404, 203.1042 262144, 3765.2394, 187.4808, 187.4808 524288, 5718.1644, 374.9616, 406.2084 1048576, 11420.7054, 734.2998, 718.6764 2097152, 35090.1564, 1515.4698, 1484.223
在没有重复且没有顺序的情况下,以下EqualityComparer可用于允许集合作为字典键:
public class SetComparer: IEqualityComparer > where T:IComparable { public bool Equals(IEnumerable first, IEnumerable second) { if (first == second) return true; if ((first == null) || (second == null)) return false; return first.ToHashSet().SetEquals(second); } public int GetHashCode(IEnumerable enumerable) { int hash = 17; foreach (T val in enumerable.OrderBy(x => x)) hash = hash * 23 + val.GetHashCode(); return hash; } }
这是我使用的ToHashSet()实现.该散列码算法来自有效的Java(由乔恩飞碟双向的方式).
如果您使用Shouldly,则可以将ShouldAllBe与Contains一起使用。
collection1 = {1, 2, 3, 4}; collection2 = {2, 4, 1, 3}; collection1.ShouldAllBe(item=>collection2.Contains(item)); // true
最后,您可以编写一个扩展。
public static class ShouldlyIEnumerableExtensions { public static void ShouldEquivalentTo(this IEnumerable list, IEnumerable equivalent) { list.ShouldAllBe(l => equivalent.Contains(l)); } }
更新
ShouldBe方法上存在一个可选参数。
collection1 = {1, 2, 3, 4}; collection2 = {2, 4, 1, 3}; collection1.ShouldAllBe(item=>collection2.Contains(item)); // true