python使用pandas抽样训练数据中某个类别实例
废话真的一句也不想多说,直接看代码吧!
#-*-coding:utf-8-*- importnumpy fromsklearnimportmetrics fromsklearn.svmimportLinearSVC fromsklearn.naive_bayesimportMultinomialNB fromsklearnimportlinear_model fromsklearn.datasetsimportload_iris fromsklearn.cross_validationimporttrain_test_split fromsklearn.preprocessingimportOneHotEncoder,StandardScaler fromsklearnimportcross_validation fromsklearnimportpreprocessing importscipyassp fromsklearn.linear_modelimportLogisticRegression fromsklearn.feature_selectionimportSelectKBest,chi2 importpandasaspd fromsklearn.preprocessingimportOneHotEncoder #importiris_data ''' creativeID,userID,positionID,clickTime,conversionTime,connectionType, telecomsOperator,appPlatform,sitesetID,positionType,age,gender, education,marriageStatus,haveBaby,hometown,residence,appID,appCategory,label ''' deftest(): df=pd.read_table("/var/lib/mysql-files/data1.csv",sep=",") df1=df[["connectionType","telecomsOperator","appPlatform","sitesetID", "positionType","age","gender","education","marriageStatus", "haveBaby","hometown","residence","appCategory","label"]] printdf1["label"].value_counts() N_data=df1[df1["label"]==0] P_data=df1[df1["label"]==1] N_data=N_data.sample(n=P_data.shape[0],frac=None,replace=False,weights=None,random_state=2,axis=0) #printdf1.loc[:,"label"]==0 printP_data.shape printN_data.shape data=pd.concat([N_data,P_data]) printdata.shape data=data.sample(frac=1).reset_index(drop=True) printdata[["label"]] return
补充拓展:pandas实现对dataframe抽样
随机抽样
importpandasaspd #对dataframe随机抽取2000个样本 pd.sample(df,n=2000)
分层抽样
利用sklean中的函数灵活进行抽样
fromsklearn.model_selectionimporttrain_test_split #y是在X中的某一个属性列 X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,stratify=y)
以上这篇python使用pandas抽样训练数据中某个类别实例就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持毛票票。
声明:本文内容来源于网络,版权归原作者所有,内容由互联网用户自发贡献自行上传,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任。如果您发现有涉嫌版权的内容,欢迎发送邮件至:czq8825#qq.com(发邮件时,请将#更换为@)进行举报,并提供相关证据,一经查实,本站将立刻删除涉嫌侵权内容。