假设你有两个稀疏矢量.举个例子:
val vec1 = Vectors.sparse(2, List(0), List(1)) // [1, 0] val vec2 = Vectors.sparse(2, List(1), List(1)) // [0, 1]
我想连接这两个向量,以便结果相当于:
val vec3 = Vectors.sparse(4, List(0, 2), List(1, 1)) // [1, 0, 0, 1]
Spark有没有这样的方便方法呢?
如果你有a中的数据DataFrame
,那么VectorAssembler
使用它是正确的.例如:
from pyspark.ml.feature import VectorAssembler dataset = spark.createDataFrame( [(0, Vectors.sparse(10, {0: 0.6931, 5: 0.0, 7: 0.5754, 9: 0.2877}), Vectors.sparse(10, {3: 0.2877, 4: 0.6931, 5: 0.0, 6: 0.6931, 8: 0.6931}))], ["label", "userFeatures1", "userFeatures2"]) assembler = VectorAssembler( inputCols=["userFeatures1", "userFeatures2"], outputCol="features") output = assembler.transform(dataset) output.select("features", "label").show(truncate=False)
你会得到以下输出:
+---------------------------------------------------------------------------+-----+ |features |label| +---------------------------------------------------------------------------+-----+ |(20,[0,7,9,13,14,16,18], [0.6931,0.5754,0.2877,0.2877,0.6931,0.6931,0.6931])|0| +---------------------------------------------------------------------------+-----+