20赞

计算双列表python 3中唯一数据双精度出现的次数

作者：凹凸曼00威威_694 | 2023-09-06 15:14

如何解决《计算双列表python3中唯一数据双精度出现的次数》经验，为你挑选了1个好方法。

假设我在python中有一个双列表[[],[]]:

doublelist = [["all", "the", "big", "dogs", "eat", "chicken", "all", "the", "small", "kids", "eat", "paste"], 
              ["the", "big", "dogs", "eat", "chicken", "all", "the", "small", "kids", "eat", "paste", "lumps"]]

我想计算doublelist[0][0] & doublelist[1][0] = all, the双列表中出现的次数.第二个[]是索引.

例如,你看到一个计数在doublelist[0][0] doublelist[1][0]和另一个在doublelist[0][6] doublelist[1][6].

我将在Python 3中使用什么代码来迭代doublelist[i][i]抓取每个值集ex.[["all"],["the"]]还有一个整数值,表示该列表中存在的值集的次数.

理想情况下,我想将它输出到triplelist[[i],[i],[i]]包含[i][i]值和第三个中的整数的三元组列表[i].

示例代码:

for i in triplelist[0]:
    print(triplelist[0][i])
    print(triplelist[1][i])
    print(triplelist[2][i])

输出:

>"all"
>"the"
>2
>"the"
>"big"
>1
>"big"
>"dogs"
>1

等等...

此外,它最好跳过重复,因此列表中不会有2个索引,[i][i][i] = [[all],[the],[2]]因为原始列表中有2个实例([0] [0] [1] [0]和[0] [6] [1] [6]).我只想要所有独特的双组词和它们在原始文本中出现的次数.

代码的目的是查看一个单词在给定文本中跟随另一个单词的频率.它用于构建一个智能马尔可夫链生成器,可以对单词值进行加权.我已经有了代码将文本分成双列表,其中包含第一个列表中的单词和第二个列表中的后续单词.

这是我目前的代码供参考(问题是在我初始化wordlisttriple之后,我不知道如何让它做到我之后描述的那样):

#import
import re #for regex expression below

#main
with open("text.txt") as rawdata:    #open text file and create a datastream
    rawtext = rawdata.read()    #read through the stream and create a string containing the text
rawdata.close()    #close the datastream
rawtext = rawtext.replace('\n', ' ')    #remove newline characters from text
rawtext = rawtext.replace('\r', ' ')    #remove newline characters from text
rawtext = rawtext.replace('--', ' -- ')    #break up blah--blah words so it can read 2 separate words blah -- blah
pat = re.compile(r'([A-Z][^\.!?]*[\.!?])', re.M)    #regex pattern for grabbing everthing before a sentence ending punctuation
sentencelist = []    #initialize list for sentences in text
sentencelist = pat.findall(rawtext)    #apply regex pattern to string to create a list of all the sentences in the text
firstwordlist = []    #initialize the list for the first word in each sentence
for index, firstword in enumerate(sentencelist):    #enumerate through the sentence list
    sentenceindex = int(index)    #get the index for below operation
    firstword = sentencelist[sentenceindex].split(' ')[0]    #use split to only grab the first word in each sentence
    firstwordlist.append(firstword)    #append each sentence starting word to first word list
rawtext = rawtext.replace(', ', ' , ')    #break up punctuation so they are not considered part of words
rawtext = rawtext.replace('. ', ' . ')    #break up punctuation so they are not considered part of words
rawtext = rawtext.replace('"', ' " ')    #break up punctuation so they are not considered part of words
sentencelistforwords = []    #initialize sentence list for parsing words
sentencelistforwords = pat.findall(rawtext)    #run the regex pattern again this time with the punctuation broken up by spaces
wordsinsentencelist = []    #initialize list for all of the words that appear in each sentence
for index, words in enumerate(sentencelist):    #enumerate through sentence list
    sentenceindex = int(index)    #grab the index for below operation
    words = sentencelist[sentenceindex].split(' ')    #split up the words in each sentence so we have a nested lists that contain each word in each sentence
    wordsinsentencelist.append(words)    #append above described to the list
wordlist = []    #initialize list of all words
wordlist = rawtext.split(' ')    #create list of all words by splitting the entire text by spaces
wordlist = list(filter(None, wordlist))    #use filter to get rid of empty strings in the list
wordlistdouble = [[], []]    #initialize the word list double to contain words and the words that follow them in sentences
for index, word in enumerate(wordlist):    #enumerate through word list
    if(int(index) < int(len(wordlist))-1):    #only go to 1 before the end of list so we don't get an index out of bounds error
        wordlistindex1 = int(index)    #grab index for first word
        wordlistindex2 = int(index)+1    #grab index for following word
        wordlistdouble[0].append(wordlist[wordlistindex1])    #append first word to first list of word list double
        wordlistdouble[1].append(wordlist[wordlistindex2])    #append following word to second list of word list double
wordlisttriple = [[], [], []]    #initialize word list triple
for index, unit in enumerate(wordlistdouble[0]):    #enumerate through word list double
    word1 = wordlistdouble[0][index]    #grab word at first list of word list double at the current index
    word2 = wordlistdouble[1][index]    #grab word at second list of word list double at the current index
    count = 0    #initialize word double data set counter
    wordlisttriple[0].append(word1)    #these need to be encapsulated in some kind of loop/if/for idk
    wordlisttriple[1].append(word2)    #these need to be encapsulated in some kind of loop/if/for idk
    wordlisttriple[2].append(count)    #these need to be encapsulated in some kind of loop/if/for idk
    #for index, unit1 in enumerate(wordlistdouble[0]):
        #if(wordlistdouble[0][int(index)] == word1 && wordlistdouble[1][int(index)+1] == word2):
            #count++

#sentencelist = list of all sentences
#firstwordlist = list of words that start sentencelist
#sentencelistforwords = list of all sentences mutated for ease of extracting words
#wordsinsentencelist = list of lists containing all of the words in each sentence
#wordlist = list of all words
#wordlistdouble = dual list of all words plus the words that follow them

任何建议将不胜感激.如果我以错误的方式解决这个问题并且有一种更简单的方法来完成同样的事情,那也会是惊人的.谢谢!

1> niemmi..：

假设你已经将文本解析为单词列表,你可以创建从第二个单词开始的迭代器,zip它带有单词并运行它Counter:

from collections import Counter

words = ["all", "the", "big", "dogs", "eat", "chicken", "all", "the", "small", "kids", "eat", "paste", "lumps"]
nxt = iter(words)
next(nxt, None)

print(*Counter(zip(words, nxt)).items(), sep='\n')

输出:

(('big', 'dogs'), 1)
(('kids', 'eat'), 1)
(('small', 'kids'), 1)
(('the', 'big'), 1)
(('dogs', 'eat'), 1)
(('eat', 'paste'), 1)
(('all', 'the'), 2)
(('chicken', 'all'), 1)
(('paste', 'lumps'), 1)
(('eat', 'chicken'), 1)
(('the', 'small'), 1)

上面nxt是一个遍历单词列表的迭代器.因为我们希望它从第二个单词开始,所以我们next在使用之前将一个单词拉出来:

>>> nxt = iter(words)
>>> next(nxt)
'all'
>>> list(nxt)
['the', 'big', 'dogs', 'eat', 'chicken', 'all', 'the', 'small', 'kids', 'eat', 'paste', 'lumps']

然后我们将原始列表和迭代器传递给zip它将返回可迭代的元组,其中每个元组都有两个项目:

>>> list(zip(words, nxt))
[('all', 'the'), ('the', 'big'), ('big', 'dogs'), ('dogs', 'eat'), ('eat', 'chicken'), ('chicken', 'all'), ('all', 'the'), ('the', 'small'), ('small', 'kids'), ('kids', 'eat'), ('eat', 'paste'), ('paste', 'lumps')]

最后,输出来zip传递给Counter每个对计数,并返回dict像对象,其中键是对,值是计数:

>>> Counter(zip(words, nxt))
Counter({('all', 'the'): 2, ('eat', 'chicken'): 1, ('big', 'dogs'): 1, ('small', 'kids'): 1, ('kids', 'eat'): 1, ('paste', 'lumps'): 1, ('chicken', 'all'): 1, ('dogs', 'eat'): 1, ('the', 'big'): 1, ('the', 'small'): 1, ('eat', 'paste'): 1})

推荐阅读

程序员
Swift将String转换为NSDate将返回nil

如何解决《Swift将String转换为NSDate将返回nil》经验，为你挑选了1个好方法。 ... [详细]
程序员
偏移量存储为Kafka时如何检查消费者偏移量？

如何解决《偏移量存储为Kafka时如何检查消费者偏移量？》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何在VichUploader中使用mimeType Assert？

如何解决《如何在VichUploader中使用mimeTypeAssert？》经验，为你挑选了0个好方法。 ... [详细]
程序员
android的安装报告失败

如何解决《android的安装报告失败》经验，为你挑选了0个好方法。 ... [详细]
程序员
为什么JAVA中的覆盖与C++有些不同？

如何解决《为什么JAVA中的覆盖与C++有些不同？》经验，为你挑选了1个好方法。 ... [详细]
程序员
DACPAC和SQL序列

如何解决《DACPAC和SQL序列》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何通过Facebook API打破白天的见解

如何解决《如何通过FacebookAPI打破白天的见解》经验，为你挑选了1个好方法。 ... [详细]
程序员
在iOS中下载并安装自定义字体

如何解决《在iOS中下载并安装自定义字体》经验，为你挑选了1个好方法。 ... [详细]
程序员
绘制相同值时显示更大的点

如何解决《绘制相同值时显示更大的点》经验，为你挑选了1个好方法。 ... [详细]
程序员
jQuery排序失败

如何解决《jQuery排序失败》经验，为你挑选了1个好方法。 ... [详细]
程序员
元素> SASS中的元素？

如何解决《元素>SASS中的元素？》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何在haml中使用br标签？

如何解决《如何在haml中使用br标签？》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何在Netbeans 8.1中使用CodeSniffer

如何解决《如何在Netbeans8.1中使用CodeSniffer》经验，为你挑选了1个好方法。 ... [详细]
程序员
如何避免字符串连接中的undefined？

如何解决《如何避免字符串连接中的undefined？》经验，为你挑选了1个好方法。 ... [详细]
程序员
为什么在非空字符串中查找空字符串会返回0

如何解决《为什么在非空字符串中查找空字符串会返回0》经验，为你挑选了1个好方法。 ... [详细]
程序员
诊断已翻译的UWP堆栈跟踪的步骤

如何解决《诊断已翻译的UWP堆栈跟踪的步骤》经验，为你挑选了1个好方法。 ... [详细]
程序员
在XmlNodeList上使用LINQ

如何解决《在XmlNodeList上使用LINQ》经验，为你挑选了2个好方法。 ... [详细]
程序员
在perl中,我可以在子例程中动态创建变量吗？

如何解决《在perl中,我可以在子例程中动态创建变量吗？》经验，为你挑选了1个好方法。 ... [详细]
程序员
EF条件包含实体类型

如何解决《EF条件包含实体类型》经验，为你挑选了0个好方法。 ... [详细]
程序员
Android Studio aar模块仅在IDE Start或Module import上加载lint.jar

如何解决《AndroidStudioaar模块仅在IDEStart或Moduleimport上加载lint.jar》经验，为你挑选了0个好方法。 ... [详细]

凹凸曼00威威_694

这个屌丝很懒，什么也没留下！

关注作者

Tags | 热门标签

RankList | 热门文章