我正在尝试检查何时以及是否满足条件的多个列值0
。我们的Spark数据框的列从1到11,需要检查其值。目前,我的代码如下:
df3 =df3.withColumn('Status', when((col("1") ==0)|(col("2") ==0)|(col("3") ==0)| (col("4") ==0) |(col("5") ==0)|(col("6") ==0)|(col("7") ==0)| (col("8") ==0)|(col("9") ==0)|(col("10") ==0)| (col("11") ==0) ,'Incomplete').otherwise('Complete'))
我如何仅通过使用for循环而不是那么多or
条件来实现此目的
我提出了一个更pythonic的解决方案。使用functools.reduce
和operator.or_
。
import operator import functools colnames = [str(i+1) for i in range(11)] df1 = spark._sc.parallelize([ [it for it in range(11)], [it for it in range(1,12)]] ).toDF((colnames)) df1.show() +---+---+---+---+---+---+---+---+---+---+---+ | 1| 2| 3| 4| 5| 6| 7| 8| 9| 10| 11| +---+---+---+---+---+---+---+---+---+---+---+ | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9| 10| | 1| 2| 3| 4| 5| 6| 7| 8| 9| 10| 11| +---+---+---+---+---+---+---+---+---+---+---+ cond_expr = functools.reduce(operator.or_, [(f.col(c) == 0) for c in df1.columns]) df1.withColumn('test', f.when(cond_expr, f.lit('Incomplete')).otherwise('Complete')).show() +---+---+---+---+---+---+---+---+---+---+---+----------+ | 1| 2| 3| 4| 5| 6| 7| 8| 9| 10| 11| test| +---+---+---+---+---+---+---+---+---+---+---+----------+ | 0| 1| 2| 3| 4| 5| 6| 7| 8| 9| 10|Incomplete| | 1| 2| 3| 4| 5| 6| 7| 8| 9| 10| 11| Complete| +---+---+---+---+---+---+---+---+---+---+---+----------+
这样,您无需定义任何函数,评估字符串表达式或使用python lambdas。希望这可以帮助。