pythonstopword

1.如何删除使用NLTK或者python停用词

1.filtered_words = [w for w in word_list if not w in stopwords.words('english')]

2. 我想您有您想要删除停用词字（WORD_LIST）的列表。你可以这样做：filtered_word_list = word_list[:] #make a copy of the word_list

for word in word_list: # iterate over word_list

if word in stopwords.words('english'):

filtered_word_list.remove(word) # remove word from filtered_word_list if it is a stopword

3. 你也可以做一组差异，例如：list(set(nltk.regexp_tokenize(sentence, pattern, gaps=True)) - set(nltk.corpus.stopwords.words('english')))

import jieba

# 创建停用词list

def stopwordslist(filepath):

stopwords = [line.strip() for line in open(filepath, 'r', encoding='utf-8').readlines()]

return stopwords

# 对句子进行分词

def seg_sentence(sentence):

sentence_seged = jieba.cut(sentence.strip())

stopwords = stopwordslist('./test/stopwords.txt') # 这里加载停用词的路径

outstr = ''

for word in sentence_seged:

if word not in stopwords:

if word != '\t':

outstr += word

outstr += " "

return outstr

inputs = open('./test/input.txt', 'r', encoding='utf-8')

outputs = open('./test/output.txt', 'w')

for line in inputs:

line_seg = seg_sentence(line) # 这里的返回值是字符串

outputs.write(line_seg + '\n')

outputs.close()

inputs.close()

本来就是一个字符

Python默认遇到回车的时候，输入结束。所以我们需要更改这个提示符，在遇到空行的时候，输入才结束。

stopword = '' # 输入停止符

str = ''

for line in iter(raw_input, stopword)： # 输入为空行，表示输入结束

str += line + '\n'

# print (str) #测试用