浅谈pandas中Dataframe的查询方法([], loc, iloc, at, iat, ix)

2023-09-14 08:06:05 53

pandas为我们提供了多种切片方法，而要是不太了解这些方法，就会经常容易混淆。下面举例对这些切片方法进行说明。

数据介绍

先随机生成一组数据：

In[5]:rnd_1=[random.randrange(1,20)forxinxrange(1000)]
...:rnd_2=[random.randrange(1,20)forxinxrange(1000)]
...:rnd_3=[random.randrange(1,20)forxinxrange(1000)]
...:fecha=pd.date_range('2012-4-10','2015-1-4')
...:
...:data=pd.DataFrame({'fecha':fecha,'rnd_1':rnd_1,'rnd_2':rnd_2,'rnd_3':rnd_3})
In[6]:data.describe()
Out[6]:
rnd_1rnd_2rnd_3
count1000.0000001000.0000001000.000000
mean9.9460009.8250009.894000
std5.5539115.5594325.423484
min1.0000001.0000001.000000
25%5.0000005.0000005.000000
50%10.00000010.00000010.000000
75%15.00000015.00000014.000000
max19.00000019.00000019.000000

[]切片方法

使用方括号能够对DataFrame进行切片，有点类似于python的列表切片。按照索引能够实现行选择或列选择或区块选择。

#行选择
In[7]:data[1:5]
Out[7]:
fecharnd_1rnd_2rnd_3
12012-04-111163
22012-04-12761
32012-04-132167
42012-04-144177
#列选择
In[10]:data[['rnd_1','rnd_3']]
Out[10]:
rnd_1rnd_3
0812
113
271
327
447
5128
6212
798
81317
947
101414
111916
12212
131518
141318
151311
16177
171410
1896
191115
201613
21189
22118
2343
24611
25213
26717
27118
28312
2942
........
970814
971195
972132
973810
974817
975616
97632
977126
9781210
9791513
98084
981173
982117
983115
98477
9851314
986619
987139
988315
989196
990711
991117
9921912
993215
994104
9951413
9961211
9971115
9981714
99938
[1000rowsx2columns]
#区块选择
In[11]:data[:7][['rnd_1','rnd_2']]
Out[11]:
rnd_1rnd_2
0817
1116
276
3216
4417
51219
627

不过对于多列选择，不能像行选择时一样使用1：5这样的方法来选择。

In[12]:data[['rnd_1':'rnd_3']]
File"",line1
data[['rnd_1':'rnd_3']]
^
SyntaxError:invalidsyntax

loc

loc可以让你按照索引来进行行列选择。

In[13]:data.loc[1:5]
Out[13]:
fecharnd_1rnd_2rnd_3
12012-04-111163
22012-04-12761
32012-04-132167
42012-04-144177
52012-04-1512198

这里需要注意的是，loc与第一种方法不同之处在于会把第5行也选择进去，而第一种方法只会选择到第4行为止。

data.loc[2:4,['rnd_2','fecha']]
Out[14]:
rnd_2fecha
262012-04-12
3162012-04-13
4172012-04-14

loc能够选择在两个特定日期之间的数据，需要注意的是这两个日期必须都要在索引中。

In[15]:data_fecha=data.set_index('fecha')
...:data_fecha.head()
Out[15]:
rnd_1rnd_2rnd_3
fecha
2012-04-1081712
2012-04-111163
2012-04-12761
2012-04-132167
2012-04-144177
In[16]:#生成两个特定日期
...:fecha_1=dt.datetime(2013,4,14)
...:fecha_2=dt.datetime(2013,4,18)
...:
...:#生成切片数据
...:data_fecha.loc[fecha_1:fecha_2]
Out[16]:
rnd_1rnd_2rnd_3
fecha
2013-04-1417105
2013-04-151449
2013-04-161218
2013-04-179151
2013-04-1816717

更新：如果没有特殊需求，强烈建议使用loc而尽量少使用[]，因为loc在对DataFrame进行重新赋值操作时会避免chainedindexing问题，使用[]时编译器很可能会给出SettingWithCopy的警告。

具体可以参见官方文档：http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

iloc

如果说loc是按照索引（index）的值来选取的话，那么iloc就是按照索引的位置来进行选取。iloc不关心索引的具体值是多少，只关心位置是多少，所以使用iloc时方括号中只能使用数值。

#行选择
In[17]:data_fecha[10:15]
Out[17]:
rnd_1rnd_2rnd_3
fecha
2012-04-2014614
2012-04-21191416
2012-04-222612
2012-04-2315818
2012-04-2413818
#列选择
In[18]:data_fecha.iloc[:,[1,2]].head()
Out[18]:
rnd_2rnd_3
fecha
2012-04-101712
2012-04-11163
2012-04-1261
2012-04-13167
2012-04-14177
#切片选择
In[19]:data_fecha.iloc[[1,12,34],[0,2]]
Out[19]:
rnd_1rnd_3
fecha
2012-04-1113
2012-04-22212
2012-05-141710

at的使用方法与loc类似，但是比loc有更快的访问数据的速度，而且只能访问单个元素，不能访问多个元素。

In[20]:timeitdata_fecha.at[fecha_1,'rnd_1']
Theslowestruntook3783.11timeslongerthanthefastest.Thiscouldmeanthatanintermediateresultisbeingcached.
100000loops,bestof3:11.3µsperloop
In[21]:timeitdata_fecha.loc[fecha_1,'rnd_1']
Theslowestruntook121.24timeslongerthanthefastest.Thiscouldmeanthatanintermediateresultisbeingcached.
10000loops,bestof3:192µsperloop
In[22]:data_fecha.at[fecha_1,'rnd_1']
Out[22]:17

iat

iat对于iloc的关系就像at对于loc的关系，是一种更快的基于索引位置的选择方法，同at一样只能访问单个元素。

In[23]:data_fecha.iat[1,0]
Out[23]:1
In[24]:timeitdata_fecha.iat[1,0]
Theslowestruntook6.23timeslongerthanthefastest.Thiscouldmeanthatanintermediateresultisbeingcached.
100000loops,bestof3:8.77µsperloop
In[25]:timeitdata_fecha.iloc[1,0]
10000loops,bestof3:158µsperloop

以上说过的几种方法都要求查询的秩在索引中，或者位置不超过长度范围，而ix允许你得到不在DataFrame索引中的数据。

In[28]:date_1=dt.datetime(2013,1,10,8,30)
...:date_2=dt.datetime(2013,1,13,4,20)
...:
...:#生成切片数据
...:data_fecha.ix[date_1:date_2]
Out[28]:
rnd_1rnd_2rnd_3
fecha
2013-01-11191719
2013-01-1210917
2013-01-1315310

如上面的例子所示，2013年1月10号并没有被选择进去，因为这个时间点被看作为0点0分，比8点30分要早一些。

以上这篇浅谈pandas中Dataframe的查询方法([],loc,iloc,at,iat,ix)就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持毛票票。

浅谈pandas中Dataframe的查询方法([], loc, iloc, at, iat, ix)

热门推荐

随机推荐