Geohash是一种地址编码,它能把二维的经纬度编码成一维的字符串。比如,北海公园的编码是wx4g0ec1。
Geohash的原理、算法
下面以(39.92324,116.3906)为例,介绍一下geohash的编码算法。
首先将纬度范围(-90,90)平分成两个区间(-90,0)、(0,90),如果目标纬度位于前一个区间,则编码为0,否则编码为1。由于39.92324属于(0,90),所以取编码为1。然后再将(0,90)分成(0,45),(45,90)两个区间,而39.92324位于(0,45),所以编码为0。以此类推,直到精度符合要求为止,得到纬度编码为10111000110001111001。
纬度范围
划分区间0
划分区间1
39.92324所属区间
(-90,90) |
(-90,0.0) |
(0.0,90) |
1 |
(0.0,90) |
(0.0,45.0) |
(45.0,90) |
0 |
(0.0,45.0) |
(0.0,22.5) |
(22.5,45.0) |
1 |
(22.5,45.0) |
(22.5,33.75) |
(33.75,45.0) |
1 |
(33.75,45.0) |
(33.75,39.375) |
(39.375,45.0) |
1 |
(39.375,45.0) |
(39.375,42.1875) |
(42.1875,45.0) |
0 |
(39.375,42.1875) |
(39.375,40.7812) |
(40.7812,42.1875) |
0 |
(39.375,40.7812) |
(39.375,40.0781) |
(40.0781,40.7812) |
0 |
(39.375,40.0781) |
(39.375,39.7265) |
(39.7265,40.0781) |
1 |
(39.7265,40.0781) |
(39.7265,39.9023) |
(39.9023,40.0781) |
1 |
(39.9023,40.0781) |
(39.9023,39.9902) |
(39.9902,40.0781) |
0 |
(39.9023,39.9902) |
(39.9023,39.9462) |
(39.9462,39.9902) |
0 |
(39.9023,39.9462) |
(39.9023,39.9243) |
(39.9243,39.9462) |
0 |
(39.9023,39.9243) |
(39.9023,39.9133) |
(39.9133,39.9243) |
1 |
(39.9133,39.9243) |
(39.9133,39.9188) |
(39.9188,39.9243) |
1 |
(39.9188,39.9243) |
(39.9188,39.9215) |
(39.9215,39.9243) |
1 |
经度也用同样的算法,对(-180,180)依次细分,得到116.3906的编码为11010010110001000100。
经度范围
划分区间0
划分区间1
116.3906所属区间
(-180,180) |
(-180,0.0) |
(0.0,180) |
1 |
(0.0,180) |
(0.0,90.0) |
(90.0,180) |
1 |
(90.0,180) |
(90.0,135.0) |
(135.0,180) |
0 |
(90.0,135.0) |
(90.0,112.5) |
(112.5,135.0) |
1 |
(112.5,135.0) |
(112.5,123.75) |
(123.75,135.0) |
0 |
(112.5,123.75) |
(112.5,118.125) |
(118.125,123.75) |
0 |
(112.5,118.125) |
(112.5,115.312) |
(115.312,118.125) |
1 |
(115.312,118.125) |
(115.312,116.718) |
(116.718,118.125) |
0 |
(115.312,116.718) |
(115.312,116.015) |
(116.015,116.718) |
1 |
(116.015,116.718) |
(116.015,116.367) |
(116.367,116.718) |
1 |
(116.367,116.718) |
(116.367,116.542) |
(116.542,116.718) |
0 |
(116.367,116.542) |
(116.367,116.455) |
(116.455,116.542) |
0 |
(116.367,116.455) |
(116.367,116.411) |
(116.411,116.455) |
0 |
(116.367,116.411) |
(116.367,116.389) |
(116.389,116.411) |
1 |
(116.389,116.411) |
(116.389,116.400) |
(116.400,116.411) |
0 |
(116.389,116.400) |
(116.389,116.394) |
(116.394,116.400) |
0 |
接下来将经度和纬度的编码合并,奇数位是纬度,偶数位是经度,得到编码1110011101001000111100000011010101100001。
最后,用0-9、b-z(去掉a,i,l,o)这32个字母进行base32编码,得到(39.92324,116.3906)的编码为wx4g0ec1。
十进制 |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
base32 |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
b |
c |
d |
e |
f |
g |
十进制 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
base32 |
h |
j |
k |
m |
n |
p |
q |
r |
s |
t |
u |
v |
w |
x |
y |
z |
解码算法与编码算法相反,先进行base32解码,然后分离出经纬度,最后根据二进制编码对经纬度范围进行细分即可,这里不再赘述。不过由于geohash表示的是区间,编码越长越精确,但不可能解码出完全一致的地址。
Geohash的应用:附近地址搜索
geohash的最大用途就是附近地址搜索了。不过,从geohash的编码算法中可以看出它的一个缺点:位于格子边界两侧的两点,虽然十分接近,但编码会完全不同。实际应用中,可以同时搜索当前格子周围的8个格子,即可解决这个问题。
最后,我们来看看本文开头提出的两个问题:速度慢,缓存命中率低。使用geohash查询附近地点,用的是字符串前缀匹配:
SELECT * FROM place WHERE geohash LIKE 'wx4g0%';
而前缀匹配可以利用geohash列上的索引,因此查询速度不会太慢。另外,即使用户坐标发生微小的变化,也能编码成相同的geohash,这就保证了每次执行相同的SQL语句,使得缓存命中率大大提高。