10赞

随机矩阵的所有行的快速随机加权选择

作者：拾味湖 | 2023-09-10 17:03

如何解决《随机矩阵的所有行的快速随机加权选择》经验，为你挑选了1个好方法。

numpy.random.choice 允许从矢量加权选择,即

arr = numpy.array([1, 2, 3])
weights = numpy.array([0.2, 0.5, 0.3])
choice = numpy.random.choice(arr, p=weights)



选择1概率为0.2,2选择概率为0.5,3选择概率为0.3.

如果我们想以矢量化的方式快速完成2D阵列(矩阵),每个行都是概率矢量,该怎么办？也就是说,我们想要一个随机矩阵的选择向量？这是超级慢的方式:

import numpy as np

m = 10
n = 100 # Or some very large number

items = np.arange(m)
prob_weights = np.random.rand(m, n)
prob_matrix = prob_weights / prob_weights.sum(axis=0, keepdims=True)

choices = np.zeros((n,))
# This is slow, because of the loop in Python
for i in range(n):
    choices[i] = np.random.choice(items, p=prob_matrix[:,i])


print(choices):

array([ 4.,  7.,  8.,  1.,  0.,  4.,  3.,  7.,  1.,  5.,  7.,  5.,  3.,
        1.,  9.,  1.,  1.,  5.,  9.,  8.,  2.,  3.,  2.,  6.,  4.,  3.,
        8.,  4.,  1.,  1.,  4.,  0.,  1.,  8.,  5.,  3.,  9.,  9.,  6.,
        5.,  4.,  8.,  4.,  2.,  4.,  0.,  3.,  1.,  2.,  5.,  9.,  3.,
        9.,  9.,  7.,  9.,  3.,  9.,  4.,  8.,  8.,  7.,  6.,  4.,  6.,
        7.,  9.,  5.,  0.,  6.,  1.,  3.,  3.,  2.,  4.,  7.,  0.,  6.,
        3.,  5.,  8.,  0.,  8.,  3.,  4.,  5.,  2.,  2.,  1.,  1.,  9.,
        9.,  4.,  3.,  3.,  2.,  8.,  0.,  6.,  1.])


这篇文章表明,cumsum和bisect可能是一个潜在的办法,并很快.但是虽然numpy.cumsum(arr, axis=1)可以沿numpy数组的一个轴执行此操作,但该bisect.bisect函数一次只能在单个数组上运行.同样,也numpy.searchsorted只适用于1D阵列.

有没有一种快速方法只使用矢量化操作来做到这一点？


1> Warren Wecke..：
这是一个非常快速的完全矢量化版本:

def vectorized(prob_matrix, items):
    s = prob_matrix.cumsum(axis=0)
    r = np.random.rand(prob_matrix.shape[1])
    k = (s < r).sum(axis=0)
    return items[k]


理论上,searchsorted用于在累积求和概率中查找随机值是正确的函数,但是m相对较小,k = (s < r).sum(axis=0)最终会更快.它的时间复杂度是O(m),而searchsorted方法是O(log(m)),但这只会更大m.  另外,cumsum是O(m),所以两者vectorized和@ perimosocordiae improved都是O(m).(如果你的m实际上要大得多,你必须运行一些测试,看看m在这个方法变慢之前有多大.)

这是我得到的时间m = 10和n = 10000(使用函数original和improved@ perimosocordiae的答案):

In [115]: %timeit original(prob_matrix, items)
1 loops, best of 3: 270 ms per loop

In [116]: %timeit improved(prob_matrix, items)
10 loops, best of 3: 24.9 ms per loop

In [117]: %timeit vectorized(prob_matrix, items)
1000 loops, best of 3: 1 ms per loop


定义函数的完整脚本是:

import numpy as np


def improved(prob_matrix, items):
    # transpose here for better data locality later
    cdf = np.cumsum(prob_matrix.T, axis=1)
    # random numbers are expensive, so we'll get all of them at once
    ridx = np.random.random(size=n)
    # the one loop we can't avoid, made as simple as possible
    idx = np.zeros(n, dtype=int)
    for i, r in enumerate(ridx):
        idx[i] = np.searchsorted(cdf[i], r)
    # fancy indexing all at once is faster than indexing in a loop
    return items[idx]


def original(prob_matrix, items):
    choices = np.zeros((n,))
    # This is slow, because of the loop in Python
    for i in range(n):
        choices[i] = np.random.choice(items, p=prob_matrix[:,i])
    return choices


def vectorized(prob_matrix, items):
    s = prob_matrix.cumsum(axis=0)
    r = np.random.rand(prob_matrix.shape[1])
    k = (s < r).sum(axis=0)
    return items[k]


m = 10
n = 10000 # Or some very large number

items = np.arange(m)
prob_weights = np.random.rand(m, n)
prob_matrix = prob_weights / prob_weights.sum(axis=0, keepdims=True)



    

    

    
        推荐阅读
        
            
                                
                    
                        程序员
                        诊断进程陷入D状态(不间断睡眠/阻塞IO)
                    

                    
                                                
                        如何解决《诊断进程陷入D状态(不间断睡眠/阻塞IO)》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        使用int13h从软盘加载段
                    

                    
                                                
                        如何解决《使用int13h从软盘加载段》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        使用if或while递归时的C差异
                    

                    
                                                
                        如何解决《使用if或while递归时的C差异》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        Grunt,在构建时将html文件复制到脚本文件夹
                    

                    
                                                
                        如何解决《Grunt,在构建时将html文件复制到脚本文件夹》经验，为你挑选了0个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        Lotus Notes/LotusScript  - 如何用德语而不是英语设置日期？
                    

                    
                                                
                        如何解决《LotusNotes/LotusScript-如何用德语而不是英语设置日期？》经验，为你挑选了0个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        找不到openssl/ssl.h但是用自制软件安装
                    

                    
                                                
                        如何解决《找不到openssl/ssl.h但是用自制软件安装》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        IntelliJ IDEA无法解析spring数据jpa @query注释中的实体
                    

                    
                                                
                        如何解决《IntelliJIDEA无法解析spring数据jpa@query注释中的实体》经验，为你挑选了0个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        在Visual Studio 2015中将WinMerge设置为差异/合并工具
                    

                    
                                                
                        如何解决《在VisualStudio2015中将WinMerge设置为差异/合并工具》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        plt.tight_layout()与sns.clustermap
                    

                    
                                                
                        如何解决《plt.tight_layout()与sns.clustermap》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        如何增加MongoDB中索引引用的数组元素？
                    

                    
                                                
                        如何解决《如何增加MongoDB中索引引用的数组元素？》经验，为你挑选了0个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        AndroidManifest xml文件中android:supportsRtl ="true"的用途是什么？
                    

                    
                                                
                        如何解决《AndroidManifestxml文件中android:supportsRtl="true"的用途是什么？》经验，为你挑选了3个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        ActionBar与工具栏或ActionBar和工具栏
                    

                    
                                                
                        如何解决《ActionBar与工具栏或ActionBar和工具栏》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        在Javascript中声明函数的最有效方法是什么？
                    

                    
                                                
                        如何解决《在Javascript中声明函数的最有效方法是什么？》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        使用Facebook PHP SDK获取Facebook广告内容
                    

                    
                                                
                        如何解决《使用FacebookPHPSDK获取Facebook广告内容》经验，为你挑选了0个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        Erlang中的排列示例
                    

                    
                                                
                        如何解决《Erlang中的排列示例》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        函数返回类型是否会影响过载的选择？
                    

                    
                                                
                        如何解决《函数返回类型是否会影响过载的选择？》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        使用不相等的索引将系列分配给DataFrame
                    

                    
                                                
                        如何解决《使用不相等的索引将系列分配给DataFrame》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        Azure Elastic DB中ReferenceTableInfo与ShardedTableInfo有什么区别？
                    

                    
                                                
                        如何解决《AzureElasticDB中ReferenceTableInfo与ShardedTableInfo有什么区别？》经验，为你挑选了0个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        无法创建Google Analytics配置文件
                    

                    
                                                
                        如何解决《无法创建GoogleAnalytics配置文件》经验，为你挑选了1个好方法。 ...
                        [详细]
                    
                    

                


                                
                    
                        程序员
                        当我从虚拟基础派生D时,为什么在VS2015中sizeof(D)增加了8个字节？
                    

                    
                                                
                        如何解决《当我从虚拟基础派生D时,为什么在VS2015中sizeof(D)增加了8个字节？》经验，为你挑选了0个好方法。 ...
                        [详细]
                    
                    

                


                

            
        
    

    
        吐了个 "CAO" !
        
            
                吐个槽吧,看都看了
            
            
                
                                        会员登录 | 用户注册
























    

    
        
            
            
                
                    
                
            

            
                拾味湖            

            
                这个屌丝很懒，什么也没留下！            
            
            

                                
                    
                    关注作者
                            

        
    


    
        Tags | 热门标签
        
            
                                
                    actionscrip
                
                                
                    bash
                
                                
                    c#
                
                                
                    c++
                
                                
                    c语言
                
                                
                    erlang
                
                                
                    flutter
                
                                
                    go
                
                                
                    golang
                
                                
                    java
                
                                
                    javascript
                
                                
                    lua
                
                                
                    node.js
                
                                
                    perl
                
                                
                    php
                
                                
                    python
                
                                
                    scala
                
                                
                    typescript
                
                                
            
        
    


    
        RankList | 热门文章
        
            
                                
                    1在Laravel中将id列添加到数据透视表的任何优点？
                
                                
                    2sleep命令执行不是我的预期
                
                                
                    3Java开发服务器错误地抛出FeatureNotEnabled异常？
                
                                
                    4在TextView中使用locale(ltr/rtl)作为重力
                
                                
                    5从数组id更新has_many关联
                
                                
                    6为什么我们真的需要多个netty老板线程？
                
                                
                    7在Seaborn隐藏轴标题
                
                                
                    8神秘的语法onClick = {:: this.submit}
                
                                
                    9100%宽度分为3*33%div
                
                                
                    10如果没有处理,则抛出相同的异常,或者构造一个新的异常？
                
                                
                    11存储过程不会插入数据
                
                                
                    12将数据框重塑为宽大的形状
                
                                
                    13将大型集合对象(从json解析)写入excel范围
                
                                
                    14在SVN中是否有一个命令来查看代码已经签出的位置？
                
                                
                    15Visual Studio代码中的XML自动注释C#
                
                                
                    16DateTime未正确保存到我的数据库中
                
                                
                    17如何在node.js中发送200响应
                
                                
                    18如何覆盖wkwebview超链接操作表
                
                                
                    19全局变量类c ++
                
                                
                    20差异两个rpms？ -  linux