Python-使用spaCy进行PoS标记和最小化
spaCy是最好的文本分析库之一。spaCy在大型信息提取任务方面表现出色,并且是世界上最快的之一。这也是准备用于深度学习的文本的最佳方法。spaCy比NLTKTagger和TextBlob更快,更准确。
如何安装?
pip install spacy python -m spacy download en_core_web_sm
示例
#importing loading the library
import spacy
# python -m spacy download en_core_web_sm
nlp = spacy.load("en_core_web_sm")
#POS-TAGGING
# Process whole documents
text = ("""My name is Vishesh. I love to work on data science problems. Please check out my github profile!""")
doc = nlp(text)
# Token and Tag
for token in doc:
print(token, token.pos_)
# You want list of Verb tokens
print("Verbs:", [token.text for token in doc if token.pos_ == "VERB"])
#Lemmatization : It is a process of grouping together the inflected #forms of a word so they can be analyzed as a single item, #identified by the word’s lemma, or dictionary form.
import spacy
# Load English tokenizer, tagger,
# parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")
# Process whole documents
text = ("""My name is Vishesh. I love to work on data science problems. Please check out my github profile!""")
doc = nlp(text)
for token in doc:
print(token, token.lemma_)热门推荐
6 保研的祝福语简短
10 年轻20岁祝福语简短
11 朋友结婚祝福语信息简短
12 女孩婚礼贺卡祝福语简短
13 30段点歌简短祝福语
14 虎年春节祝福语图文简短
15 写给后妈祝福语大全简短
16 简短回复生日祝福语
17 校长送毕业祝福语简短
18 毕业立体贺卡祝福语简短