我有一个.csv
包含多个表的文件.
使用熊猫,这将是拿到两个数据帧的最佳策略inventory
,并HPBladeSystemRack
从这个文件?
输入.csv
看起来像这样:
Inventory System Name IP Address System Status dg-enc05 Normal dg-enc05_vc_domain Unknown dg-enc05-oa1 172.20.0.213 Normal HP BladeSystem Rack System Name Rack Name Enclosure Name dg-enc05 BU40 dg-enc05-oa1 BU40 dg-enc05 dg-enc05-oa2 BU40 dg-enc05
到目前为止,我提出的最好的方法是将此.csv
文件转换为Excel工作簿(xlxs
),将表拆分为表并使用:
inventory = read_excel('path_to_file.csv', 'sheet1', skiprow=1) HPBladeSystemRack = read_excel('path_to_file.csv', 'sheet2', skiprow=2)
然而:
这种方法需要xlrd
模块.
必须实时分析这些日志文件,以便找到一种分析它们来自日志的方法.
真实的日志有比这两个更多的表.
DSM.. 11
如果您事先知道表名,那么这样的事情:
df = pd.read_csv("jahmyst2.csv", header=None, names=range(3)) table_names = ["Inventory", "HP BladeSystem Rack", "Network Interface"] groups = df[0].isin(table_names).cumsum() tables = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)}
应该生成一个字典,其中键作为表名和值作为子表.
>>> list(tables) ['HP BladeSystem Rack', 'Inventory'] >>> for k,v in tables.items(): ... print("table:", k) ... print(v) ... print() ... table: HP BladeSystem Rack 0 1 2 6 System Name Rack Name Enclosure Name 7 dg-enc05 BU40 NaN 8 dg-enc05-oa1 BU40 dg-enc05 9 dg-enc05-oa2 BU40 dg-enc05 table: Inventory 0 1 2 1 System Name IP Address System Status 2 dg-enc05 NaN Normal 3 dg-enc05_vc_domain NaN Unknown 4 dg-enc05-oa1 172.20.0.213 Normal
一旦你有了,你可以将列名设置为第一行等.
如果您事先知道表名,那么这样的事情:
df = pd.read_csv("jahmyst2.csv", header=None, names=range(3)) table_names = ["Inventory", "HP BladeSystem Rack", "Network Interface"] groups = df[0].isin(table_names).cumsum() tables = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)}
应该生成一个字典,其中键作为表名和值作为子表.
>>> list(tables) ['HP BladeSystem Rack', 'Inventory'] >>> for k,v in tables.items(): ... print("table:", k) ... print(v) ... print() ... table: HP BladeSystem Rack 0 1 2 6 System Name Rack Name Enclosure Name 7 dg-enc05 BU40 NaN 8 dg-enc05-oa1 BU40 dg-enc05 9 dg-enc05-oa2 BU40 dg-enc05 table: Inventory 0 1 2 1 System Name IP Address System Status 2 dg-enc05 NaN Normal 3 dg-enc05_vc_domain NaN Unknown 4 dg-enc05-oa1 172.20.0.213 Normal
一旦你有了,你可以将列名设置为第一行等.