以下问题
评估字符串中的数学表达式
用Python解析方程式
在Python中解析用户提供的数学公式的安全方法
在Python中评估来自不安全用户输入的数学方程式
并且他们各自的答案让我想到我如何能够有效地解析一个(或多或少可信的)用户给出的单个数学表达式(一般来说,就这个答案而言)/sf/ask/17360801/来自数据库的20k到30k输入值.我实施了快速而肮脏的基准测试,因此我可以比较不同的解
# Runs with Python 3(.4) import pprint import time # This is what I have userinput_function = '5*(1-(x*0.1))' # String - numbers should be handled as floats demo_len = 20000 # Parameter for benchmark (20k to 30k in real life) print_results = False # Some database, represented by an array of dicts (simplified for this example) database_xy = [] for a in range(1, demo_len, 1): database_xy.append({ 'x':float(a), 'y_eval':0, 'y_sympya':0, 'y_sympyb':0, 'y_sympyc':0, 'y_aevala':0, 'y_aevalb':0, 'y_aevalc':0, 'y_numexpr': 0, 'y_simpleeval':0 })
#解决方案#1:eval [是的,完全不安全]
time_start = time.time() func = eval("lambda x: " + userinput_function) for item in database_xy: item['y_eval'] = func(item['x']) time_end = time.time() if print_results: pprint.pprint(database_xy) print('1 eval: ' + str(round(time_end - time_start, 4)) + ' seconds')
#解决方案#2a:sympy - evalf(http://www.sympy.org)
import sympy time_start = time.time() x = sympy.symbols('x') sympy_function = sympy.sympify(userinput_function) for item in database_xy: item['y_sympya'] = float(sympy_function.evalf(subs={x:item['x']})) time_end = time.time() if print_results: pprint.pprint(database_xy) print('2a sympy: ' + str(round(time_end - time_start, 4)) + ' seconds')
#解决方案#2b:sympy - lambdify(http://www.sympy.org)
from sympy.utilities.lambdify import lambdify import sympy import numpy time_start = time.time() sympy_functionb = sympy.sympify(userinput_function) func = lambdify(x, sympy_functionb, 'numpy') # returns a numpy-ready function xx = numpy.zeros(len(database_xy)) for index, item in enumerate(database_xy): xx[index] = item['x'] yy = func(xx) for index, item in enumerate(database_xy): item['y_sympyb'] = yy[index] time_end = time.time() if print_results: pprint.pprint(database_xy) print('2b sympy: ' + str(round(time_end - time_start, 4)) + ' seconds')
#解决方案#2c:sympy - lambdify与numexpr [和numpy](http://www.sympy.org)
from sympy.utilities.lambdify import lambdify import sympy import numpy import numexpr time_start = time.time() sympy_functionb = sympy.sympify(userinput_function) func = lambdify(x, sympy_functionb, 'numexpr') # returns a numpy-ready function xx = numpy.zeros(len(database_xy)) for index, item in enumerate(database_xy): xx[index] = item['x'] yy = func(xx) for index, item in enumerate(database_xy): item['y_sympyc'] = yy[index] time_end = time.time() if print_results: pprint.pprint(database_xy) print('2c sympy: ' + str(round(time_end - time_start, 4)) + ' seconds')
#解决方案#3a:asteval [基于ast] - 使用string magic(http://newville.github.io/asteval/index.html)
from asteval import Interpreter aevala = Interpreter() time_start = time.time() aevala('def func(x):\n\treturn ' + userinput_function) for item in database_xy: item['y_aevala'] = aevala('func(' + str(item['x']) + ')') time_end = time.time() if print_results: pprint.pprint(database_xy) print('3a aeval: ' + str(round(time_end - time_start, 4)) + ' seconds')
#解决方案#3b(M Newville):asteval [基于ast] - 解析和运行(http://newville.github.io/asteval/index.html)
from asteval import Interpreter aevalb = Interpreter() time_start = time.time() exprb = aevalb.parse(userinput_function) for item in database_xy: aevalb.symtable['x'] = item['x'] item['y_aevalb'] = aevalb.run(exprb) time_end = time.time() print('3b aeval: ' + str(round(time_end - time_start, 4)) + ' seconds')
#解决方案#3c(M Newville):asteval [基于ast] - 解析并运行numpy(http://newville.github.io/asteval/index.html)
from asteval import Interpreter import numpy aevalc = Interpreter() time_start = time.time() exprc = aevalc.parse(userinput_function) x = numpy.array([item['x'] for item in database_xy]) aevalc.symtable['x'] = x y = aevalc.run(exprc) for index, item in enumerate(database_xy): item['y_aevalc'] = y[index] time_end = time.time() print('3c aeval: ' + str(round(time_end - time_start, 4)) + ' seconds')
#解决方案#4:simpleeval [基于ast](https://github.com/danthedeckie/simpleeval)
from simpleeval import simple_eval time_start = time.time() for item in database_xy: item['y_simpleeval'] = simple_eval(userinput_function, names={'x': item['x']}) time_end = time.time() if print_results: pprint.pprint(database_xy) print('4 simpleeval: ' + str(round(time_end - time_start, 4)) + ' seconds')
#解决方案#5 numexpr [和numpy](https://github.com/pydata/numexpr)
import numpy import numexpr time_start = time.time() x = numpy.zeros(len(database_xy)) for index, item in enumerate(database_xy): x[index] = item['x'] y = numexpr.evaluate(userinput_function) for index, item in enumerate(database_xy): item['y_numexpr'] = y[index] time_end = time.time() if print_results: pprint.pprint(database_xy) print('5 numexpr: ' + str(round(time_end - time_start, 4)) + ' seconds')
在我的旧测试机器上(Python 3.4,Linux 3.11 x86_64,两个内核,1.8GHz),我得到以下结果:
1 eval: 0.0185 seconds 2a sympy: 10.671 seconds 2b sympy: 0.0315 seconds 2c sympy: 0.0348 seconds 3a aeval: 2.8368 seconds 3b aeval: 0.5827 seconds 3c aeval: 0.0246 seconds 4 simpleeval: 1.2363 seconds 5 numexpr: 0.0312 seconds
突出的是令人难以置信的评估速度,尽管我不想在现实生活中使用它.第二个最好的解决方案似乎是numexpr,这取决于numpy - 我想避免的依赖,虽然这不是一个硬性要求.接下来最好的事情是simpleeval,它围绕着ast构建.aeval,另一种基于ast的解决方案,我必须首先将每个浮点输入值转换为字符串,我无法找到方法.sympy最初是我最喜欢的,因为它提供了最灵活,最安全的解决方案,但它最终与最后一个解决方案相距甚远.
更新1:使用sympy有一种更快的方法.见解2b.它几乎和numexpr一样好,但我不确定sympy是否实际上是在内部使用它.
更新2:sympy实现现在使用sympify而不是简化(根据其主要开发人员的建议,asmeurer - thanks).它不使用numexpr,除非明确要求它这样做(参见解决方案2c).我还根据asteval添加了两个明显更快的解决方案(感谢M Newville).
我还有哪些方法可以进一步加快任何相对安全的解决方案?是否有其他安全(-ish)方法直接使用ast?