我正在构建一个CGI脚本,用于轮询SQLite数据库并构建统计表.下面描述源数据库表,以及相关代码的块.一切正常(功能上),但CGI本身非常慢,因为我有多个嵌套SELECT COUNT(id)
调用.我认为我最好的优化是向SO社区提问,因为我与谷歌的时间相对无果而终.
桌子:
CREATE TABLE messages ( id TEXT PRIMARY KEY ON CONFLICT REPLACE, date TEXT, hour INTEGER, sender TEXT, size INTEGER, origin TEXT, destination TEXT, relay TEXT, day TEXT);
(是的,我知道这个表没有标准化,但它填充了邮件日志中的摘录...我很高兴得到提取和填充工作,更不用说规范化了.我不认为表结构有一个在这一点上与我的问题很相关,但我可能是错的.)
样本行:
476793200A7|Jan 29 06:04:47|6|admin@mydomain.com|4656|web02.mydomain.pvt|user@example.com|mail01.mydomain.pvt|Jan 29
而且,构建我的表的Python代码:
#!/usr/bin/python print 'Content-type: text/html\n\n' from datetime import date import re p = re.compile('(\w+) (\d+)') d_month = {'Jan':1,'Feb':2,'Mar':3,'Apr':4,'May':5,'Jun':6,'Jul':7,'Aug':8,'Sep':9,'Oct':10,'Nov':11,'Dec':12} l_wkday = ['Mo','Tu','We','Th','Fr','Sa','Su'] days = [] curs.execute('SELECT DISTINCT(day) FROM messages ORDER BY day') for day in curs.fetchall(): m = p.match(day[0]).group(1) m = d_month[m] d = p.match(day[0]).group(2) days.append([day[0],"%s (%s)" % (day[0],l_wkday[date.weekday(date(2010,int(m),int(d)))])]) curs.execute('SELECT DISTINCT(sender) FROM messages') senders = curs.fetchall() for sender in senders: curs.execute('SELECT COUNT(id) FROM messages WHERE sender=%s',(sender[0])) print '' print '' print 'Stats for Sender: '+sender[0]+'
' print '
Total messages in database: %d ' % curs.fetchone()[0] print '' print ' Hour of Day ' % ' Day %s '.join(map(str,range(24))) for day in days: print ' ' print ' %s ' % day[1] for hour in range(24): sql = 'SELECT COUNT(id) FROM messages WHERE sender="%s" AND day="%s" AND hour="%s"' % (sender[0],day[0],str(hour)) curs.execute(sql) d = curs.fetchone()[0] print '%s ' % (d>0 and str(d) or '') print '
\n