我尝试过这篇文章但是,它似乎并不适合我.
我试过这段代码:
for bresult in response.css(LIST_SELECTOR): NAME_SELECTOR = 'h2 a ::attr(href)' yield { 'name': bresult.css(NAME_SELECTOR).extract_first(), } b_result_list.append(bresult.css(NAME_SELECTOR).extract_first()) #set b_result_list to SET to remove dups, then change back to LIST set(b_result_list) list(set(b_result_list)) for brl in b_result_list: print("brl: {}".format(brl))
打印出:
brl: https://facebook.site.com/users/login brl: https://facebook.site.com/users brl: https://facebook.site.com/users/login
当我需要时:
brl: https://facebook.site.com/users/login brl: https://facebook.site.com/users
我在这做错了什么?
谢谢!
你需要保存它时丢弃结果... b_result_list
从不实际更改...所以你只是迭代原始列表.而是保存set
操作的结果
b_result_list = list(set(b_result_list))
(注意set
s不保留顺序)