我有一些莫尔斯代码丢失了字母之间的空格,我的挑战是找出消息所说的内容.到目前为止,由于可能存在大量的组合,我有点失落.
以下是我所拥有的消息的所有信息.
输出将是英语
总会有一个有意义的翻译
这是和示例消息 -..-...-...-...-..-.-.-.-.-..-.-.-.-.-.-.-.-.-.-..-...-.
消息不应超过70个字符
莫尔斯电码是从更长的流中获取的,因此第一组或最后一组可能会被切断,因此没有有效的翻译
有没有人有一个聪明的解决方案?
这不是一个容易的问题,因为正如鲁赫所说,给定的信息有许多可行的句子.例如,"JACK AND JILL WENT UP THE HILL"具有与"JACK AND JILL WALK CHISELED"相同的编码.由于这些都是语法句子,并且每个中的单词都很常见,所以如何选择一个或另一个(或任何其他40141055989476564163599不同的英语单词序列与此消息具有相同的编码)并不明显,如果不钻研自然语言处理.
无论如何,这里是一个动态编程解决方案,解决了找到最短句子的问题(如果有一个平局则用最少的字符).它还可以计算与给定消息具有相同编码的句子总数.它需要一个文件中的英文单词字典.
下一个增强应该是更好地衡量一个句子的可能性:可能是单词频率,莫尔斯的假阳性率(例如,"I"是一个常用词,但它通常作为莫尔斯码序列的其他序列的一部分出现) .棘手的部分将是制定一个好的分数函数,可以用可以使用动态编程计算的方式表示.
MORSE = dict(zip('ABCDEFGHIJKLMNOPQRSTUVWXYZ', [ '.-', '-...', '-.-.', '-..', '.', '..-.', '--.', '....', '..', '.---', '-.-', '.-..', '--', '-.', '---', '.--.', '--.-', '.-.', '...', '-', '..-', '...-', '.--', '-..-', '-.--', '--..' ])) # Read a file containing A-Z only English words, one per line. WORDS = set(word.strip().upper() for word in open('dict.en').readlines()) # A set of all possible prefixes of English words. PREFIXES = set(word[:j+1] for word in WORDS for j in xrange(len(word))) def translate(msg, c_sep=' ', w_sep=' / '): """Turn a message (all-caps space-separated words) into morse code.""" return w_sep.join(c_sep.join(MORSE[c] for c in word) for word in msg.split(' ')) def encode(msg): """Turn a message into timing-less morse code.""" return translate(msg, '', '') def c_trans(morse): """Construct a map of char transitions. The return value is a dict, mapping indexes into the morse code stream to a dict of possible characters at that location to where they would go in the stream. Transitions that lead to dead-ends are omitted. """ result = [{} for i in xrange(len(morse))] for i_ in xrange(len(morse)): i = len(morse) - i_ - 1 for c, m in MORSE.iteritems(): if i + len(m) < len(morse) and not result[i + len(m)]: continue if morse[i:i+len(m)] != m: continue result[i][c] = i + len(m) return result def find_words(ctr, i, prefix=''): """Find all legal words starting from position i. We generate all possible words starting from position i in the morse code stream, assuming we already have the given prefix. ctr is a char transition dict, as produced by c_trans. """ if prefix in WORDS: yield prefix, i if i == len(ctr): return for c, j in ctr[i].iteritems(): if prefix + c in PREFIXES: for w, j2 in find_words(ctr, j, prefix + c): yield w, j2 def w_trans(ctr): """Like c_trans, but produce a word transition map.""" result = [{} for i in xrange(len(ctr))] for i_ in xrange(len(ctr)): i = len(ctr) - i_ - 1 for w, j in find_words(ctr, i): if j < len(result) and not result[j]: continue result[i][w] = j return result def shortest_sentence(wt): """Given a word transition map, find the shortest possible sentence. We find the sentence that uses the entire morse code stream, and has the fewest number of words. If there are multiple sentences that satisfy this, we return the one that uses the smallest number of characters. """ result = [-1 for _ in xrange(len(wt))] + [0] words = [None] * len(wt) for i_ in xrange(len(wt)): i = len(wt) - i_ - 1 for w, j in wt[i].iteritems(): if result[j] == -1: continue if result[i] == -1 or result[j] + 1 + len(w) / 30.0 < result[i]: result[i] = result[j] + 1 + len(w) / 30.0 words[i] = w i = 0 result = [] while i < len(wt): result.append(words[i]) i = wt[i][words[i]] return result def sentence_count(wt): result = [0] * len(wt) + [1] for i_ in xrange(len(wt)): i = len(wt) - i_ - 1 for j in wt[i].itervalues(): result[i] += result[j] return result[0] msg = 'JACK AND JILL WENT UP THE HILL' print sentence_count(w_trans(c_trans(encode(msg)))) print shortest_sentence(w_trans(c_trans(encode(msg))))