1.如何删除使用NLTK或者python停用词
1.filtered_words = [w for w in word_list if not w in stopwords.words('english')]
2. 我想您有您想要删除停用词字(WORD_LIST)的列表。你可以这样做:filtered_word_list = word_list[:] #make a copy of the word_list
for word in word_list: # iterate over word_list
if word in stopwords.words('english'):
filtered_word_list.remove(word) # remove word from filtered_word_list if it is a stopword
3. 你也可以做一组差异,例如:list(set(nltk.regexp_tokenize(sentence, pattern, gaps=True)) - set(nltk.corpus.stopwords.words('english')))
2.python jieba分词如何去除停用词
import jieba
# 创建停用词list
def stopwordslist(filepath):
stopwords = [line.strip() for line in open(filepath, 'r', encoding='utf-8').readlines()]
return stopwords
# 对句子进行分词
def seg_sentence(sentence):
sentence_seged = jieba.cut(sentence.strip())
stopwords = stopwordslist('./test/stopwords.txt') # 这里加载停用词的路径
outstr = ''
for word in sentence_seged:
if word not in stopwords:
if word != '\t':
outstr += word
outstr += " "
return outstr
inputs = open('./test/input.txt', 'r', encoding='utf-8')
outputs = open('./test/output.txt', 'w')
for line in inputs:
line_seg = seg_sentence(line) # 这里的返回值是字符串
outputs.write(line_seg + '\n')
outputs.close()
inputs.close()
3.python2怎么将回车当作一个字符接收
本来就是一个字符
Python默认遇到回车的时候,输入结束。所以我们需要更改这个提示符,在遇到空行的时候,输入才结束。
1
2
3
4
5
6
stopword = '' # 输入停止符
str = ''
for line in iter(raw_input, stopword): # 输入为空行,表示输入结束
str += line + '\n'
# print (str) #测试用
转载请注明出处51数据库 » pythonstopword
夕阳下奔跑的姨妈