这应该很简单.我想要的是能够按函数的结果进行分组,就像在SQL中你可以按表达式分组:
SELECT substr(name, 1) as letter, COUNT(*) as count FROM table GROUP BY substr(name, 1)
这将计算名称列以字母表的每个字母开头的行数.
我想在python中做同样的事情,所以我假设我可以将一个函数传递给groupby.但是,这只会将索引列(第一列)传递给函数,例如0,1或2.我想要的是名称列:
import pandas # Return the first letter def first_letter(row): # row is 0, then 1, then 2 etc. return row.name[0] #Generate a data set of words test = pandas.DataFrame({'name': ["benevolent", "hidden", "absurdity", "anonymous", "furious", "antidemocratic", "honeydew"]}) # name # 0 benevolent # 1 hidden # 2 absurdity # 3 anonymous # 4 furious # 5 antidemocratic # 6 honeydew test.groupby(first_letter)
我在这做错了什么.除了行索引之外的其他东西如何组?
为第一个字母创建一个新列:
def first_letter(row): return row[0] test['first'] = test['name'].apply(first_letter)
并将其分组:
group = test.groupby('first')
用它:
>>> group.count() name first a 3 b 1 f 1 h 2