我需要一个快速算法从通用列表中选择5个随机元素.例如,我想从a获得5个随机元素List
.
使用linq:
YourList.OrderBy(x => rnd.Next()).Take(5)
迭代通过并为每个元素使选择的概率=(需要的数量)/(数字左)
因此,如果您有40个项目,那么第一个项目将有5/40的机会被选中.如果是,则下一次有4/39的机会,否则它有5/39的机会.当你到达目的地时,你会得到5件物品,而且在此之前你通常会拥有所有物品.
public static ListGetRandomElements (this IEnumerable list, int elementsCount) { return list.OrderBy(arg => Guid.NewGuid()).Take(elementsCount).ToList(); }
这实际上是一个比它听起来更难的问题,主要是因为许多数学上正确的解决方案实际上无法实现所有可能性(更多内容见下文).
首先,这里有一些易于实现,正确的,如果你有一个真正的随机数生成器:
(0)凯尔的答案,即O(n).
(1)生成n对[(0,rand),(1,rand),(2,rand),...]的列表,按第二个坐标对它们进行排序,并使用第一个k(对于你,k) = 5)获取随机子集的索引.我认为这很容易实现,虽然它是O(n log n)时间.
(2)初始化一个空列表s = [],它将成为k个随机元素的索引.随机选择{0,1,2,...,n-1}中的数字r,r = rand%n,并将其添加到s.接下来取r = rand%(n-1)并坚持s; 在s中添加少于#元素的#元素以避免冲突.接下来取r = rand%(n-2),并做同样的事情,等等,直到你在s中有k个不同的元素.这具有最坏情况的运行时间O(k ^ 2).所以对于k << n,这可以更快.如果你保持排序并跟踪它有哪些连续的间隔,你可以在O(k log k)中实现它,但它更有效.
@Kyle - 你是对的,第二个想我同意你的回答.我一开始匆匆读了它,并错误地认为你指示按顺序选择每个固定概率为k/n的元素,这本来是错误的 - 但你的自适应方法对我来说是正确的.对于那个很抱歉.
好的,现在对于踢球者:渐近(对于固定的k,n在增长),有n ^ k/k!n个元素中k元素子集的选择[这是(n选择k)的近似].如果n很大,而k不是很小,那么这些数字就很大了.在任何标准32位随机数发生器中,您可以期望的最佳周期长度是2 ^ 32 = 256 ^ 4.因此,如果我们有1000个元素的列表,并且我们想要随机选择5,那么标准随机数生成器就无法实现所有可能性.但是,只要您选择适用于较小的集合并且始终"看起来"随机,那么这些算法应该没问题.
附录:写完之后,我意识到正确实现构思(2)很棘手,所以我想澄清这个答案.要获得O(k log k)时间,您需要一个支持O(log m)搜索和插入的类似数组的结构 - 平衡二叉树可以执行此操作.使用这样的结构来构建一个名为s的数组,这里有一些伪随机:
# Returns a container s with k distinct random numbers from {0, 1, ..., n-1} def ChooseRandomSubset(n, k): for i in range(k): r = UniformRandom(0, n-i) # May be 0, must be < n-i q = s.FirstIndexSuchThat( s[q] - q > r ) # This is the search. s.InsertInOrder(q ? r + q : r + len(s)) # Inserts right before q. return s
我建议通过一些示例案例来了解这是如何有效地实现上述英语解释的.
我认为所选答案是正确的,非常可爱.我实现它的方式不同,因为我也希望结果是随机顺序的.
static IEnumerablePickSomeInRandomOrder ( IEnumerable someTypes, int maxCount) { Random random = new Random(DateTime.Now.Millisecond); Dictionary randomSortTable = new Dictionary (); foreach(SomeType someType in someTypes) randomSortTable[random.NextDouble()] = someType; return randomSortTable.OrderBy(KVP => KVP.Key).Take(maxCount).Select(KVP => KVP.Value); }
我刚遇到这个问题,而且更多的谷歌搜索让我想到了随机洗牌的问题:http://en.wikipedia.org/wiki/Fisher-Yates_shuffle
要完全随机地移动列表(就地),请执行以下操作:
要改组n个元素的数组(索引0..n-1):
for i from n ? 1 downto 1 do j ? random integer with 0 ? j ? i exchange a[j] and a[i]
如果你只需要前5个元素,那么你不需要从n-1到1运行i,而只需要将它运行到n-5(即:n-5)
让我们说你需要k项,
这变为:
for (i = n ? 1; i >= n-k; i--) { j = random integer with 0 ? j ? i exchange a[j] and a[i] }
选择的每个项目都交换到数组的末尾,因此选择的k个元素是数组的最后k个元素.
这需要时间O(k),其中k是您需要的随机选择元素的数量.
此外,如果您不想修改初始列表,可以在临时列表中记下所有交换,反转该列表,然后再次应用它们,从而执行相反的交换集并返回初始列表而不更改O(k)运行时间.
最后,对于真正的stickler,if(n == k),你应该停在1而不是nk,因为随机选择的整数总是0.
你可以使用它,但订购将在客户端进行
.AsEnumerable().OrderBy(n => Guid.NewGuid()).Take(5);
从算法中的龙, C#中的解释:
int k = 10; // items to select var items = new List(new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 }); var selected = new List (); double needed = k; double available = items.Count; var rand = new Random(); while (selected.Count < k) { if( rand.NextDouble() < needed / available ) { selected.Add(items[(int)available-1]) needed--; } available--; }
该算法将选择项目列表的唯一标记.
正在考虑@JohnShedletsky 关于(释义)接受的答案的评论:
你应该能够在O(subset.Length),而不是O(originalList.Length)
基本上,您应该能够生成subset
随机索引,然后从原始列表中提取它们.
public static class EnumerableExtensions { public static Random randomizer = new Random(); // you'd ideally be able to replace this with whatever makes you comfortable public static IEnumerableGetRandom (this IEnumerable list, int numItems) { return (list as T[] ?? list.ToArray()).GetRandom(numItems); // because ReSharper whined about duplicate enumeration... /* items.Add(list.ElementAt(randomizer.Next(list.Count()))) ) numItems--; */ } // just because the parentheses were getting confusing public static IEnumerable GetRandom (this T[] list, int numItems) { var items = new HashSet (); // don't want to add the same item twice; otherwise use a list while (numItems > 0 ) // if we successfully added it, move on if( items.Add(list[randomizer.Next(list.Length)]) ) numItems--; return items; } // and because it's really fun; note -- you may get repetition public static IEnumerable PluckRandomly (this IEnumerable list) { while( true ) yield return list.ElementAt(randomizer.Next(list.Count())); } }
如果你想成为更有效,你可能会使用HashSet
的的指标,而不是实际列表中的元素(如果你有复杂类型或昂贵的比较);
并确保我们没有任何碰撞等
[TestClass] public class RandomizingTests : UnitTestBase { [TestMethod] public void GetRandomFromList() { this.testGetRandomFromList((list, num) => list.GetRandom(num)); } [TestMethod] public void PluckRandomly() { this.testGetRandomFromList((list, num) => list.PluckRandomly().Take(num), requireDistinct:false); } private void testGetRandomFromList(Func, int, IEnumerable > methodToGetRandomItems, int numToTake = 10, int repetitions = 100000, bool requireDistinct = true) { var items = Enumerable.Range(0, 100); IEnumerable randomItems = null; while( repetitions-- > 0 ) { randomItems = methodToGetRandomItems(items, numToTake); Assert.AreEqual(numToTake, randomItems.Count(), "Did not get expected number of items {0}; failed at {1} repetition--", numToTake, repetitions); if(requireDistinct) Assert.AreEqual(numToTake, randomItems.Distinct().Count(), "Collisions (non-unique values) found, failed at {0} repetition--", repetitions); Assert.IsTrue(randomItems.All(o => items.Contains(o)), "Some unknown values found; failed at {0} repetition--", repetitions); } } }
从组中选择N个随机项不应该与订单有任何关系!随机性是关于不可预测性的,而不是关于组中的洗牌位置.处理某种有序排序的所有答案都必然效率低于不具备这种顺序的答案.由于效率是关键,我会发布一些不会过多改变项目顺序的东西.
1)如果您需要真正的随机值,这意味着对可供选择的元素没有限制(即,一旦选择的项目可以重新选择):
public static ListGetTrueRandom (this IList source, int count, bool throwArgumentOutOfRangeException = true) { if (throwArgumentOutOfRangeException && count > source.Count) throw new ArgumentOutOfRangeException(); var randoms = new List (count); randoms.AddRandomly(source, count); return randoms; }
如果关闭异常标志,则可以多次选择随机项.
如果您有{1,2,3,4},那么它可以为3个项目提供{1,4,4},{1,4,3}等,甚至{1,4,3,2,4} 5项!
这应该很快,因为它没有什么可检查的.
2)如果你需要小组中的个别成员没有重复,那么我会依赖一本字典(正如许多人已经指出的那样).
public static ListGetDistinctRandom (this IList source, int count) { if (count > source.Count) throw new ArgumentOutOfRangeException(); if (count == source.Count) return new List (source); var sourceDict = source.ToIndexedDictionary(); if (count > source.Count / 2) { while (sourceDict.Count > count) sourceDict.Remove(source.GetRandomIndex()); return sourceDict.Select(kvp => kvp.Value).ToList(); } var randomDict = new Dictionary (count); while (randomDict.Count < count) { int key = source.GetRandomIndex(); if (!randomDict.ContainsKey(key)) randomDict.Add(key, sourceDict[key]); } return randomDict.Select(kvp => kvp.Value).ToList(); }
代码比其他字典方法有点长,因为我不仅添加,而且还从列表中删除,所以它有点两个循环.你可以在这里看到,当我变得相等时,我没有重新订购任何东西.那是因为我认为随机性应该在返回的集合中作为一个整体.我的意思是,如果你想5名从随机的项目,如果它不应该的问题还是,但如果你需要4个从同一组项目,那么它应该产生不可预测的,,等.其次,当随机的项目数是返回的是原始组的一半以上,然后更容易删除count
source.Count
1, 2, 3, 4, 5
1, 3, 4, 2, 5
1, 2, 3, 4, 5
1, 2, 3, 4
1, 3, 5, 2
2, 3, 5, 4
source.Count - count
组中的count
项目比添加项目.出于性能原因,我使用source
而不是sourceDict
在remove方法中获取随机索引.
因此,如果您有{1,2,3,4},则最终可能会在{1,2,3},{3,4,1}等3个项目中结束.
3)如果您需要通过考虑原始组中的重复项来从您的组中获得真正不同的随机值,那么您可以使用与上面相同的方法,但是a HashSet
将比字典轻.
public static ListGetTrueDistinctRandom (this IList source, int count, bool throwArgumentOutOfRangeException = true) { if (count > source.Count) throw new ArgumentOutOfRangeException(); var set = new HashSet (source); if (throwArgumentOutOfRangeException && count > set.Count) throw new ArgumentOutOfRangeException(); List list = hash.ToList(); if (count >= set.Count) return list; if (count > set.Count / 2) { while (set.Count > count) set.Remove(list.GetRandom()); return set.ToList(); } var randoms = new HashSet (); randoms.AddRandomly(list, count); return randoms.ToList(); }
该randoms
变量是HashSet
为了避免在最罕见的情况下添加重复项,其中Random.Next
可以产生相同的值,尤其是当输入列表很小时.
所以{1,2,2,4} => 3个随机项=> {1,2,4}并且从不{1,2,2}
{1,2,2,4} => 4个随机项=>异常!! 或{1,2,4}取决于标志集.
我使用的一些扩展方法:
static Random rnd = new Random(); public static int GetRandomIndex(this ICollection source) { return rnd.Next(source.Count); } public static T GetRandom (this IList source) { return source[source.GetRandomIndex()]; } static void AddRandomly (this ICollection toCol, IList fromList, int count) { while (toCol.Count < count) toCol.Add(fromList.GetRandom()); } public static Dictionary ToIndexedDictionary (this IEnumerable lst) { return lst.ToIndexedDictionary(t => t); } public static Dictionary ToIndexedDictionary (this IEnumerablelst, FuncvalueSelector) { int index = -1; return lst.ToDictionary(t => ++index, valueSelector); }
如果它与几十列表中的项目1000个所有关于性能已经被重复10000次,那么你可能需要有更快的随机类比System.Random
,但我不认为这是什么大不了的考虑,后者极有可能是从来没有一个瓶颈,它足够快..
编辑:如果你需要重新安排退回物品的顺序,那么没有任何东西可以击败达金的Fisher-Yates方法 - 简短,甜美和简单..
我结合以上几个答案来创建一个Lazily评估的扩展方法。我的测试表明,凯尔(Order(N))的方法比drzaus使用集合提议随机索引以选择(Order(K))慢许多倍。前者对随机数生成器执行更多调用,并对这些项进行更多次迭代。
我实施的目标是:
1)如果给定的不是IList的IEnumerable,则不要实现完整列表。如果给我一系列不计其数的项目,则我不想耗尽内存。使用Kyle的方法获得在线解决方案。
2)如果我可以确定它是一个IList,请使用drzaus的方法。如果K大于N的一半,我会冒着重击的风险,因为我一次又一次选择许多随机索引而不得不跳过它们。因此,我编写了一个索引列表以不保存。
3)我保证将按照遇到的相同顺序退回这些物品。凯尔算法无需任何改动。drzaus的算法要求我不要按照选择随机索引的顺序发出项目。我将所有索引收集到SortedSet中,然后按已排序的索引顺序发出项目。
4)如果K比N大,并且我颠倒了集合的含义,那么我将枚举所有项目并测试索引是否不在集合中。这意味着我损失了Order(K)运行时间,但是由于在这种情况下K接近N,因此我的损失不大。
这是代码:
////// Takes k elements from the next n elements at random, preserving their order. /// /// If there are fewer than n elements in items, this may return fewer than k elements. /// ///Type of element in the items collection. /// Items to be randomly selected. /// Number of items to pick. /// Total number of items to choose from. /// If the items collection contains more than this number, the extra members will be skipped. /// If the items collection contains fewer than this number, it is possible that fewer than k items will be returned. ///Enumerable over the retained items. /// /// See http://stackoverflow.com/questions/48087/select-a-random-n-elements-from-listt-in-c-sharp for the commentary. /// public static IEnumerableTakeRandom (this IEnumerable items, int k, int n) { var r = new FastRandom(); var itemsList = items as IList ; if (k >= n || (itemsList != null && k >= itemsList.Count)) foreach (var item in items) yield return item; else { // If we have a list, we can infer more information and choose a better algorithm. // When using an IList, this is about 7 times faster (on one benchmark)! if (itemsList != null && k < n/2) { // Since we have a List, we can use an algorithm suitable for Lists. // If there are fewer than n elements, reduce n. n = Math.Min(n, itemsList.Count); // This algorithm picks K index-values randomly and directly chooses those items to be selected. // If k is more than half of n, then we will spend a fair amount of time thrashing, picking // indices that we have already picked and having to try again. var invertSet = k >= n/2; var positions = invertSet ? (ISet ) new HashSet () : (ISet ) new SortedSet (); var numbersNeeded = invertSet ? n - k : k; while (numbersNeeded > 0) if (positions.Add(r.Next(0, n))) numbersNeeded--; if (invertSet) { // positions contains all the indices of elements to Skip. for (var itemIndex = 0; itemIndex < n; itemIndex++) { if (!positions.Contains(itemIndex)) yield return itemsList[itemIndex]; } } else { // positions contains all the indices of elements to Take. foreach (var itemIndex in positions) yield return itemsList[itemIndex]; } } else { // Since we do not have a list, we will use an online algorithm. // This permits is to skip the rest as soon as we have enough items. var found = 0; var scanned = 0; foreach (var item in items) { var rand = r.Next(0,n-scanned); if (rand < k - found) { yield return item; found++; } scanned++; if (found >= k || scanned >= n) break; } } } }
我使用了专门的随机数生成器,但是您可以根据需要使用C#的Random。(FastRandom由Colin Green编写,是SharpNEAT的一部分。周期为2 ^ 128-1,比许多RNG都要好。)
以下是单元测试:
[TestClass] public class TakeRandomTests { ////// Ensure that when randomly choosing items from an array, all items are chosen with roughly equal probability. /// [TestMethod] public void TakeRandom_Array_Uniformity() { const int numTrials = 2000000; const int expectedCount = numTrials/20; var timesChosen = new int[100]; var century = new int[100]; for (var i = 0; i < century.Length; i++) century[i] = i; for (var trial = 0; trial < numTrials; trial++) { foreach (var i in century.TakeRandom(5, 100)) timesChosen[i]++; } var avg = timesChosen.Average(); var max = timesChosen.Max(); var min = timesChosen.Min(); var allowedDifference = expectedCount/100; AssertBetween(avg, expectedCount - 2, expectedCount + 2, "Average"); //AssertBetween(min, expectedCount - allowedDifference, expectedCount, "Min"); //AssertBetween(max, expectedCount, expectedCount + allowedDifference, "Max"); var countInRange = timesChosen.Count(i => i >= expectedCount - allowedDifference && i <= expectedCount + allowedDifference); Assert.IsTrue(countInRange >= 90, String.Format("Not enough were in range: {0}", countInRange)); } ////// Ensure that when randomly choosing items from an IEnumerable that is not an IList, /// all items are chosen with roughly equal probability. /// [TestMethod] public void TakeRandom_IEnumerable_Uniformity() { const int numTrials = 2000000; const int expectedCount = numTrials / 20; var timesChosen = new int[100]; for (var trial = 0; trial < numTrials; trial++) { foreach (var i in Range(0,100).TakeRandom(5, 100)) timesChosen[i]++; } var avg = timesChosen.Average(); var max = timesChosen.Max(); var min = timesChosen.Min(); var allowedDifference = expectedCount / 100; var countInRange = timesChosen.Count(i => i >= expectedCount - allowedDifference && i <= expectedCount + allowedDifference); Assert.IsTrue(countInRange >= 90, String.Format("Not enough were in range: {0}", countInRange)); } private IEnumerableRange(int low, int count) { for (var i = low; i < low + count; i++) yield return i; } private static void AssertBetween(int x, int low, int high, String message) { Assert.IsTrue(x > low, String.Format("Value {0} is less than lower limit of {1}. {2}", x, low, message)); Assert.IsTrue(x < high, String.Format("Value {0} is more than upper limit of {1}. {2}", x, high, message)); } private static void AssertBetween(double x, double low, double high, String message) { Assert.IsTrue(x > low, String.Format("Value {0} is less than lower limit of {1}. {2}", x, low, message)); Assert.IsTrue(x < high, String.Format("Value {0} is more than upper limit of {1}. {2}", x, high, message)); } }