有没有办法加快这个简单的PyMC模型?在20-40个数据点上,需要约5-11秒才能适应.
import pymc import time import numpy as np from collections import OrderedDict # prior probability of rain p_rain = 0.5 variables = OrderedDict() # rain observations data = [True, True, True, True, True, False, False, False, False, False]*4 num_steps = len(data) p_rain_given_rain = 0.9 p_rain_given_norain = 0.2 p_umbrella_given_rain = 0.8 p_umbrella_given_norain = 0.3 for n in range(num_steps): if n == 0: # Rain node at time t = 0 rain = pymc.Bernoulli("rain_%d" %(n), p_rain) else: rain_trans = \ pymc.Lambda("rain_trans", lambda prev_rain=variables["rain_%d" %(n-1)]: \ prev_rain*p_rain_given_rain + (1-prev_rain)*p_rain_given_norain) rain = pymc.Bernoulli("rain_%d" %(n), p=rain_trans) umbrella_obs = \ pymc.Lambda("umbrella_obs", lambda rain=rain: \ rain*p_umbrella_given_rain + (1-rain)*p_umbrella_given_norain) umbrella = pymc.Bernoulli("umbrella_%d" %(n), p=umbrella_obs, observed=True, value=data[n]) variables["rain_%d" %(n)] = rain variables["umbrella_%d" %(n)] = umbrella print "running on %d points" %(len(data)) all_vars = variables.values() t_start = time.time() model = pymc.Model(all_vars) m = pymc.MCMC(model) m.sample(iter=2000) t_end = time.time() print "\n%.2f secs to run" %(t_end - t_start)
只有40个数据点,运行需要11秒:
running on 40 points [-----------------100%-----------------] 2000 of 2000 complete in 11.5 sec 11.54 secs to run
(80分需要20秒).这是一个玩具的例子.Lambda()
确定转换的内部表达式实际上更复杂.这种基本代码结构是灵活的(而使用转换矩阵编码模型的灵活性较低).有没有办法保持类似的代码结构,但获得更好的性能?如有必要,很高兴切换到PyMC3.谢谢.