Pandas统计重复的列里面的值方法
pandas
代码如下:
importpandasaspd importnumpyasnp salaries=pd.DataFrame({ 'name':['BOSS','Lilei','Lilei','Han','BOSS','BOSS','Han','BOSS'], 'Year':[2016,2016,2016,2016,2017,2017,2017,2017], 'Salary':[1,2,3,4,5,6,7,8], 'Bonus':[2,2,2,2,3,4,5,6] }) print(salaries) print(salaries['Bonus'].duplicated(keep='first')) print(salaries[salaries['Bonus'].duplicated(keep='first')].index) print(salaries[salaries['Bonus'].duplicated(keep='first')]) print(salaries['Bonus'].duplicated(keep='last')) print(salaries[salaries['Bonus'].duplicated(keep='last')].index) print(salaries[salaries['Bonus'].duplicated(keep='last')])
输出如下:
BonusSalaryYearname 0212016BOSS 1222016Lilei 2232016Lilei 3242016Han 4352017BOSS 5462017BOSS 6572017Han 7682017BOSS 0False 1True 2True 3True 4False 5False 6False 7False Name:Bonus,dtype:bool Int64Index([1,2,3],dtype='int64') BonusSalaryYearname 1222016Lilei 2232016Lilei 3242016Han 0True 1True 2True 3False 4False 5False 6False 7False Name:Bonus,dtype:bool Int64Index([0,1,2],dtype='int64') BonusSalaryYearname 0212016BOSS 1222016Lilei 2232016Lilei
非pandas
对于如nunpy中的这些操作主要如下:
假设有数组
a=np.array([1,2,1,3,3,3,0])
想找出[13]
则有
方法1 m=np.zeros_like(a,dtype=bool) m[np.unique(a,return_index=True)[1]]=True a[~m]
方法2 a[~np.in1d(np.arange(len(a)),np.unique(a,return_index=True)[1],assume_unique=True)]
方法3 np.setxor1d(a,np.unique(a),assume_unique=True)
方法4 u,i=np.unique(a,return_inverse=True) u[np.bincount(i)>1]
方法5 s=np.sort(a,axis=None) s[:-1][s[1:]==s[:-1]]
参考:https://stackoverflow.com/questions/11528078/determining-duplicate-values-in-an-array
以上这篇Pandas统计重复的列里面的值方法就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持毛票票。