本文共 1157 字,大约阅读时间需要 3 分钟。
2、用python實現統計一篇英文文章內每個單詞的出現頻率,並返回出現頻率最高的前10個單詞及其出現次數,並解答以下問題?(標點符號可忽略)
(1) 創建文件對象f后,解釋f的readlines和xreadlines方法的區別?
(2) 追加需求:引號內元素需要算作一個單詞,如何實現?
cat /root/text.txt
hello world 2018 xiaowei,good luck
hello kitty 2017 wangleai,ha he
hello kitty ,hasd he
hello kitty ,hasaad hedsfds
#我的腳本
#!/usr/bin/python
#get ['a','b','c']
import re
with open('/root/text.txt') as f:
openfile = f.read()
def get_list_dict():
word_list = re.split('[0-9\W]+',openfile)
list_no_repeat = set(word_list)
dict_word = {}
for each_word in list_no_repeat:
dict_word[each_word] = word_list.count(each_word)
del dict_word['']
return dict_word
#{'a':2,'c':5,'b':1} => {'c':5,'a':2,'b':1}
def sort_dict_get_ten(dict_word):
list_after_sorted = sorted(dict_word.items(),key=lambda x:x[1],reverse=True)
print list_after_sorted
for i in range(3):
print list_after_sorted[i][0],list_after_sorted[i][1]
def main():
dict_word = get_list_dict()
sort_dict_get_ten(dict_word)
if __name__ == '__main__':
main()
[('hello', 4), ('kitty', 3), ('he', 2), ('good', 1), ('hasd', 1), ('wangleai', 1), ('hasaad', 1), ('xiaowei', 1), ('hedsfds', 1), ('luck', 1), ('world', 1), ('ha', 1)]
hello 4
kitty 3
he 2
转载地址:https://blog.csdn.net/weixin_33132553/article/details/115063343 如侵犯您的版权,请留言回复原文章的地址,我们会给您删除此文章,给您带来不便请您谅解!