当前位置:  开发笔记 > 编程语言 > 正文

将整数数组转换为"向量"

如何解决《将整数数组转换为"向量"》经验,为你挑选了2个好方法。



1> John Zwinck..:

首先,将变换编码为数组(由于您没有映射0,因此使用虚拟的第一个元素):

>>> mapping = np.array([[0,0,0],[0,0,1],[0,1,0],[1,0,0]])

然后它是微不足道的:

>>> arr = np.array([1,1,2,3,3,3])
>>> mapping[arr]
array([[0, 0, 1],
      [0, 0, 1],
      [0, 1, 0],
      [1, 0, 0],
      [1, 0, 0],
      [1, 0, 0]])



2> MSeifert..:

您实际上只需比较它们并设置适当的项目:

>>> # a bit shorter so it's easier to demonstrate
>>> arr = np.array([1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
>>> arr2 = np.zeros([arr.size, 3], arr.dtype)
>>> arr2[:, 0] = arr == 3
>>> arr2[:, 1] = arr == 2
>>> arr2[:, 2] = arr == 1

>>> arr2
array([[0, 0, 1],
       [0, 0, 1],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0],
       [1, 0, 0]])

你说你对效率感兴趣,所以我做了一些时间:

my_dict = {
    1:[0,0,1],
    2:[0,1,0],
    3:[1,0,0]
    }

mapping = np.array([[0,0,0],[0,0,1],[0,1,0],[1,0,0]])

def mine(arr):
    arr2 = np.zeros([arr.size, 3], arr.dtype)
    arr2[:, 0] = arr == 3
    arr2[:, 1] = arr == 2
    arr2[:, 2] = arr == 1
    return arr2

def JoaoAreias(arr):
    return [my_dict[i] for i in arr]

def JohnZwinck(arr):
    return mapping[arr]

def Divakar(arr):
    return (arr == np.arange(3,0,-1)[:,None]).T.astype(np.int8)

def Divakar2(arr):
    return np.take(mapping, arr,axis=0)

arr = np.random.randint(1, 4, (150))
np.testing.assert_array_equal(mine(arr), JohnZwinck(arr))
np.testing.assert_array_equal(mine(arr), mine_numba(arr))
np.testing.assert_array_equal(mine(arr), Divakar(arr))
np.testing.assert_array_equal(mine(arr), Divakar2(arr))
%timeit mine(arr)        # 5. - 10000 loops, best of 3: 48.3 µs per loop
%timeit JoaoAreias(arr)  # 6. - 10000 loops, best of 3: 179 µs per loop
%timeit JohnZwinck(arr)  # 3. - 10000 loops, best of 3: 24.1 µs per loop
%timeit mine_numba(arr)  # 1. - 100000 loops, best of 3: 6.02 µs per loop
%timeit Divakar(arr)     # 4. - 10000 loops, best of 3: 34.2 µs per loop
%timeit Divakar2(arr)    # 2. - 100000 loops, best of 3: 13.5 µs per loop

arr = np.random.randint(1, 4, (10000))
np.testing.assert_array_equal(mine(arr), JohnZwinck(arr))
np.testing.assert_array_equal(mine(arr), mine_numba(arr))
np.testing.assert_array_equal(mine(arr), Divakar(arr))
np.testing.assert_array_equal(mine(arr), Divakar2(arr))
%timeit mine(arr)        # 4. - 1000 loops, best of 3: 201 µs per loop
%timeit JoaoAreias(arr)  # 6. - 100 loops, best of 3: 10.2 ms per loop
%timeit JohnZwinck(arr)  # 5. - 1000 loops, best of 3: 455 µs per loop
%timeit mine_numba(arr)  # 1. - 10000 loops, best of 3: 103 µs per loop
%timeit Divakar(arr)     # 3. - 10000 loops, best of 3: 155 µs per loop
%timeit Divakar2(arr)    # 2. - 10000 loops, best of 3: 146 µs per loop

所以它取决于你喜欢的datasize,如果它比@JohnZwinck有一个最小的解决方案,对于"更大"的数据集,我的方法获胜.:)


实际上,如果你要使用像numba(或者替代cython或类似)这样的东西,你可以击败所有其他方法:

import numba as nb

@nb.njit
def mine_numba(arr):
    arr2 = np.zeros((arr.size, 3), arr.dtype)
    for idx in range(arr.size):
        item = arr[idx]
        if item == 1:
            arr2[idx, 2] = 1
        elif item == 2:
            arr2[idx, 1] = 1
        else:
            arr2[idx, 0] = 1
    return arr2

推荐阅读
李桂平2402851397
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有