如何折叠R数据帧中的因子水平?
有时未正确记录一个因子的水平,例如,在某些地方记录男性的水平,在某些地方记录男性的水平,因此水平水平有两个水平。因此,如果不正确地记录了因子水平,则水平数会增加,我们需要解决此问题,因为使用这些因子水平的分析将是错误的。要将不正确的因子级别转换为适当的因子级别,我们可以使用列表函数来定义那些级别。
例子1
F<-c("Male","Ma","Fem","Female","M","Male","Mal","Male","Fe","Female","M","Fema","Ma","Femal","F","Fem","Male","Ma","Male","Female") Rate<-rep(c(25,30,37,56),times=5) df1<-data.frame(F,Rate) df1
输出结果
F Rate 1 Male 25 2 Ma 30 3 Fem 37 4 Female 56 5 M 25 6 Male 30 7 Mal 37 8 Male 56 9 Fe 25 10 Female 30 11 M 37 12 Fema 56 13 Ma 25 14 Femal 30 15 F 37 16 Fem 56 17 Male 25 18 Ma 30 19 Male 37 20 Female 56 levels(df1$F)<-list("Male"=c("Male","Ma","Mal","M"),"Female"=c("Female","Fe","Fem","Fema","Femal","F")) df1 F Rate 1 Male 25 2 Male 30 3 Female 37 4 Female 56 5 Male 25 6 Male 30 7 Male 37 8 Male 56 9 Female 25 10 Female 30 11 Male 37 12 Female 56 13 Male 25 14 Female 30 15 Female 37 16 Female 56 17 Male 25 18 Male 30 19 Male 37 20 Female 56
例子2
MotorCycleTypes<-c("Cru","Sp","Sport","Tour","Endu","Cruiser","Touri","Enduro","Spo","Cruise","Touring","To","Sp","End","Cruis","Cruiser","Sport","End","Tour","Enduro") Frequency<-sample(1:30,20,replace=TRUE) df2<-data.frame(MotorCycleTypes,Frequency) df2
输出结果
MotorCycleTypes Frequency 1 Cru 5 2 Sp 15 3 Sport 10 4 Tour 2 5 Endu 25 6 Cruiser 6 7 Touri 17 8 Enduro 5 9 Spo 15 10 Cruise 25 11 Touring 12 12 To 11 13 Sp 20 14 End 6 15 Cruis 1 16 Cruiser 12 17 Sport 21 18 End 5 19 Tour 23 20 Enduro 2 levels(df2$MotorCycleTypes)<-list("Cruise"=c("Cruiser","Cru","Cruis","Cruise"),"Sport"=c("Sport","Sp","Spo"),"Enduro"=c("Enduro","Endu","End"),"Touring"=c("Touring","Tour","To","Touri")) df2 MotorCycleTypes Frequency 1 Cruise 5 2 Sport 15 3 Sport 10 4 Touring 2 5 Enduro 25 6 Cruise 6 7 Touring 17 8 Enduro 5 9 Sport 15 10 Cruise 25 11 Touring 12 12 Touring 11 13 Sport 20 14 Enduro 6 15 Cruise 1 16 Cruise 12 17 Sport 21 18 Enduro 5 19 Touring 23 20 Enduro 2