Python实现一个带权无回置随机抽选函数的方法

2023-08-18 17:16:04 59

需求

有一个抽奖应用，从所有参与的用户抽出K位中奖用户(K=奖品数量)，且要根据每位用户拥有的抽奖码数量作为权重。

如假设有三个用户及他们的权重是:A(1),B(1),C(2)。希望抽到A的概率为25%，抽到B的概率为25%,抽到C的概率为50%。

分析

比较直观的做法是把两个C放到列表中抽选，如[A,B,C,C]，使用Python内置的函数random.choice[A,B,C,C],这样C抽到的概率即为50%。

这个办法的问题是权重比较大的时候，浪费内存空间。

更一般的方法是，将所有权重加和4，然后从[0,4)区间里随机挑选一个值，将A,B,C占用不同大小的区间。[0,1)是A,[1,2)是B,[2,4)是C。

使用Python的函数random.ranint(0,3)或者int(random.random()*4)均可产生0-3的随机整数R。判断R在哪个区间即选择哪个用户。

接下来是寻找随机数在哪个区间的方法，

一种方法是按顺序遍历列表并保存已遍历的元素权重综合S，一旦S大于R，就返回当前元素。

fromoperatorimportitemgetter

users=[('A',1),('B',1),('C',2)]

total=sum(map(itemgetter(1),users))

rnd=int(random.random()*total)#0~3

s=0
foru,winusers:
s+=w
ifs>rnd:
returnu

不过这种方法的复杂度是O(N)，因为要遍历所有的users。

可以想到另外一种方法，先按顺序把累积加的权重排成列表，然后对它使用二分法搜索，二分法复杂度降到O(logN)(除去其他的处理)

users=[('A',1),('B',1),('C',2)]

cum_weights=list(itertools.accumulate(map(itemgetter(1),users)))#[1,2,4]

total=cum_weights[-1]

rnd=int(random.random()*total)#0~3

hi=len(cum_weights)-1
index=bisect.bisect(cum_weights,rnd,0,hi)

returnusers(index)[0]

Python内置库random的choices函数(3.6版本后有)即是如此实现，random.choices函数签名为random.choices(population,weights=None,*,cum_weights=None,k=1)population是待选列表，weights是各自的权重，cum_weights是可选的计算好的累加权重（两者选一），k是抽选数量（有回置抽选）。源码如下:

defchoices(self,population,weights=None,*,cum_weights=None,k=1):
"""Returnaksizedlistofpopulationelementschosenwithreplacement.
Iftherelativeweightsorcumulativeweightsarenotspecified,
theselectionsaremadewithequalprobability.
"""
random=self.random
ifcum_weightsisNone:
ifweightsisNone:
_int=int
total=len(population)
return[population[_int(random()*total)]foriinrange(k)]
cum_weights=list(_itertools.accumulate(weights))
elifweightsisnotNone:
raiseTypeError('Cannotspecifybothweightsandcumulativeweights')
iflen(cum_weights)!=len(population):
raiseValueError('Thenumberofweightsdoesnotmatchthepopulation')
bisect=_bisect.bisect
total=cum_weights[-1]
hi=len(cum_weights)-1
return[population[bisect(cum_weights,random()*total,0,hi)]
foriinrange(k)]

更进一步

因为Python内置的random.choices是有回置抽选，无回置抽选函数是random.sample，但该函数不能根据权重抽选（random.sample(population,k)）。

原生的random.sample可以抽选个多个元素但不影响原有的列表，其使用了两种算法实现,保证了各种情况均有良好的性能。(源码地址：random.sample)

第一种是部分shuffle，得到K个元素就返回。时间复杂度是O(N)，不过需要复制原有的序列，增加内存使用。

result=[None]*k
n=len(population)
pool=list(population)#不改变原有的序列
foriinrange(k):
j=int(random.random()*(n-i))
result[k]=pool[j]
pool[j]=pool[n-i-1]#已选中的元素移走，后面未选中元素填上
returnresult

而第二种是设置一个已选择的set，多次随机抽选，如果抽中的元素在set内，就重新再抽，无需复制新的序列。当k相对n较小时，random.sample使用该算法，重复选择元素的概率较小。

selected=set()
selected_add=selected.add#加速方法访问
foriinrange(k):
j=int(random.random()*n)
whilejinselected:
j=int(random.random()*n)
selected_add(j)
result[j]=population[j]
returnresult

抽奖应用需要的是带权无回置抽选算法，结合random.choices和random.sample的实现写一个函数weighted_sample。

一般抽奖的人数都比奖品数量大得多，可选用random.sample的第二种方法作为无回置抽选，当然可以继续优化。

代码如下：

defweighted_sample(population,weights,k=1):
"""Likerandom.sample,butaddweights.
"""
n=len(population)
ifn==0:
return[]
ifnot0<=k<=n:
raiseValueError("Samplelargerthanpopulationorisnegative")
iflen(weights)!=n:
raiseValueError('Thenumberofweightsdoesnotmatchthepopulation')

cum_weights=list(itertools.accumulate(weights))
total=cum_weights[-1]
iftotal<=0:#预防一些错误的权重
returnrandom.sample(population,k=k)
hi=len(cum_weights)-1

selected=set()
_bisect=bisect.bisect
_random=random.random
selected_add=selected.add
result=[None]*k
foriinrange(k):
j=_bisect(cum_weights,_random()*total,0,hi)
whilejinselected:
j=_bisect(cum_weights,_random()*total,0,hi)
selected_add(j)
result[i]=population[j]
returnresult

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持毛票票。

Python实现一个带权无回置随机抽选函数的方法

热门推荐

随机推荐