我正在尝试从电子邮件的纯文本脚本中提取电子邮件地址.我拼凑了一些代码来查找地址本身,但我不知道如何区分它们; 现在它只是吐出文件中的所有电子邮件地址.我想这样做它只会吐出前面有"From:"和一些通配符的地址,并以">"结尾(因为电子邮件设置为From [name] <[email]> ).
这是现在的代码:
import re #allows program to use regular expressions foundemail = [] #this is an empty list mailsrch = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}') #do not currently know exact meaning of this expression but assuming #it means something like "[stuff]@[stuff][stuff1-4 letters]" # "line" is a variable is set to a single line read from the file # ("text.txt"): for line in open("text.txt"): foundemail.extend(mailsrch.findall(line)) # this extends the previously named list via the "mailsrch" variable #which was named before print foundemail
Van Gale.. 36
试试这个:
>>> from email.utils import parseaddr >>> parseaddr('From: vg@m.com') ('', 'vg@m.com') >>> parseaddr('From: Van Gale') ('Van Gale', 'vg@m.com') >>> parseaddr(' From: Van Gale ') ('Van Gale', 'vg@m.com') >>> parseaddr('blah abdf From: Van Gale and this') ('Van Gale', 'vg@m.com')
不幸的是,它只找到每行中的第一封电子邮件,因为它期待标题行,但也许这没关系?
试试这个:
>>> from email.utils import parseaddr >>> parseaddr('From: vg@m.com') ('', 'vg@m.com') >>> parseaddr('From: Van Gale') ('Van Gale', 'vg@m.com') >>> parseaddr(' From: Van Gale ') ('Van Gale', 'vg@m.com') >>> parseaddr('blah abdf From: Van Gale and this') ('Van Gale', 'vg@m.com')
不幸的是,它只找到每行中的第一封电子邮件,因为它期待标题行,但也许这没关系?
import email msg = email.message_from_string(str) # or # f = open(file) # msg = email.message_from_file(f) msg['from'] # and optionally from email.utils import parseaddr addr = parseaddr(msg['from'])