当前位置:  开发笔记 > 编程语言 > 正文

自然语言解析器,用于解析体育逐个播放数据

如何解决《自然语言解析器,用于解析体育逐个播放数据》经验,为你挑选了0个好方法。

我正在尝试为足球比赛提供解析器.我非常宽松地使用"自然语言"一词,所以请耐心等待,因为我对这个领域几乎一无所知.

以下是我正在使用的一些示例(格式:TIME | DOWN&DIST | OFF_TEAM | DESCRIPTION):

04:39|4th and 20@NYJ46|Dal|Mat McBriar punts for 32 yards to NYJ14. Jeremy Kerley - no return. FUMBLE, recovered by NYJ.|
04:31|1st and 10@NYJ16|NYJ|Shonn Greene rush up the middle for 5 yards to the NYJ21. Tackled by Keith Brooking.|
03:53|2nd and 5@NYJ21|NYJ|Mark Sanchez rush to the right for 3 yards to the NYJ24. Tackled by Anthony Spencer. FUMBLE, recovered by NYJ (Matthew Mulligan).|
03:20|1st and 10@NYJ33|NYJ|Shonn Greene rush to the left for 4 yards to the NYJ37. Tackled by Jason Hatcher.|
02:43|2nd and 6@NYJ37|NYJ|Mark Sanchez pass to the left to Shonn Greene for 7 yards to the NYJ44. Tackled by Mike Jenkins.|
02:02|1st and 10@NYJ44|NYJ|Shonn Greene rush to the right for 1 yard to the NYJ45. Tackled by Anthony Spencer.|
01:23|2nd and 9@NYJ45|NYJ|Mark Sanchez pass to the left to LaDainian Tomlinson for 5 yards to the 50. Tackled by Sean Lee.|

到目前为止,我已经编写了一个愚蠢的解析器来处理所有简单的东西(playID,季度,时间,向下和距离,进攻团队)以及一些脚本,这些脚本可以获取这些数据并将其清理成上面看到的格式.单行变为"Play"对象以存储到数据库中.

这里的困难部分(至少对我来说)是解析戏剧的描述.以下是我想从该字符串中提取的一些信息:

示例字符串:

"Mark Sanchez pass to the left to Shonn Greene for 7 yards to the NYJ44. Tackled by Mike Jenkins."

结果:

turnover = False
interception = False
fumble = False
to_on_downs = False
passing = True
rushing = False
direction = 'left'
loss = False
penalty = False
scored = False
TD = False
PA = False
FG = False
TPC = False
SFTY = False
punt = False
kickoff = False
ret_yardage = 0
yardage_diff = 7
playmakers = ['Mark Sanchez', 'Shonn Greene', 'Mike Jenkins']

我对初始解析器的逻辑是这样的:

# pass, rush or kick
# gain or loss of yards
# scoring play
    # Who scored? off or def?
    # TD, PA, FG, TPC, SFTY?
# first down gained
# punt?
# kick?
    # return yards?
# penalty?
    # def or off?
# turnover?
    # INT, fumble, to on downs?
# off play makers
# def play makers

描述可以变得非常毛茸茸(多次摸索和恢复与惩罚等),我想知道我是否可以利用一些NLP模块.我可能会在像解析器这样的哑/静态状态机上花几天时间,但如果有人建议如何使用NLP技术来处理它,我想听听它们.

推荐阅读
保佑欣疼你的芯疼
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有