>>> df = pd.DataFrame({'col_1':['a','b','a','b'], 'col_2':['c','c','d','d'], 'col_3':[1, 2, 3, 4]}) >>> df col_1 col_2 col_3 0 a c 1 1 b c 2 2 a d 3 3 b d 4 >>> df.sort_values(by=['col_1','col_3']) col_1 col_2 col_3 0 a c 1 2 a d 3 1 b c 2 3 b d 4
>>> df = pd.DataFrame({'col_1':['a','b','a','b'], 'col_2':['c','c','d','d'], 'col_3':[1, 2, 3, 4]}) >>> df col_1 col_2 col_3 0 a c 1 1 b c 2 2 a d 3 3 b d 4 >>> df.groupby('col_1').mean() # 针对col_1中不同值分别求均值 col_3 col_1 a 2 b 3 >>> df.groupby('col_1').apply(np.mean) # np.mean没有括号,也可以是自定义函数 col_3 col_1 a 2.0 b 3.0 >>> df.groupby(['col_1','col_2']).count() # 计数,count不包含NaN值,而size计数时包含NaN值 col_3 col_1 col_2 a c 1 d 1 b c 1 d 1 >>> df.groupby('col_1').size() col_1 a 2 b 2 dtype: int64
DataFrame的索引切片
loc: Access a group of rows and columns by label(s) or a boolean array.
iloc: Purely integer-location based indexing for selection by position.
at: Access a single value for a row/column label pair.
iat: Access a single value for1 a row/column pair by integer position.
ix: A primarily label-location based indexer, with integer position fallback. (已经被删除,被loc和iloc替代)
>>> df col_1 col_2 col_3 0 a c 1 1 b c 2 2 a d 3 3 b d 4
>>> df.loc[1] col_1 b col_2 c col_3 2 Name: 1, dtype: object
>>> df.loc[1, 'col_1'] 'b'
>>> df.loc[[3,1,0]] col_1 col_2 col_3 3 b d 4 1 b c 2 0 a c 1
>>> df.iloc[1,1] 'c'
>>> df.at[1, 'col_1'] 'b'
>>> df.iat[1,1] 'c'
# 直接通过列标签选取 >>> df['col_1'] 0 a 1 b 2 a 3 b Name: col_1, dtype: object
# 根据条件选取 >>> df[df['col_3'] == 3] col_1 col_2 col_3 2 a d 3
需要注意的是,使用iloc时,冒号右边取不到,使用loc时,冒号右边可以取到。
1 2 3 4 5 6 7 8 9 10 11 12 13 14
>>> df = pd.DataFrame({'col_1':['a','b','a','b'], 'col_2':['c','c','d','d'], 'col_3':[1, 2, 3, 4]}) >>> df col_1 col_2 col_3 0 a c 1 1 b c 2 2 a d 3 3 b d 4 >>> df.iloc[:1] col_1 col_2 col_3 0 a c 1 >>> df.loc[:1] col_1 col_2 col_3 0 a c 1 1 b c 2