no classification
no tag
no datas
posted on 2024-11-07 20:00 read(643) comment(0) like(14) collect(2)
I essentially want to use the nltk StanfordNERTagger in order to purify a list of names (eg. there are organizations in there I want to remove) and I stumbled on weird issue. It seems the tag results of one sentence depend on what other sentences are given, which isn't very intuitive.
Here is how to reproduce:
from nltk.tag import StanfordNERTagger
tagger = StanfordNERTagger('/path/to/english.all.3class.distsim.crf.ser.gz','/path/to/stanford-ner-2017-06-09/stanford-ner.jar',encoding='utf-8')
things_to_tag = ["Star Trek".split(),
"Star Jones".split(),
"Star Wars".split()]
# tagging using tag_sents
print tagger.tag_sents( things_to_tag )
# tagging using tag
for t in things_to_tag:
print tagger.tag(t)
Output:
[[(u'Star', u'ORGANIZATION'), (u'Trek', u'ORGANIZATION')],
[(u'Star', u'ORGANIZATION'), (u'Jones', u'ORGANIZATION')],
[(u'Star', u'ORGANIZATION'), (u'Wars', u'ORGANIZATION')]]
[(u'Star', u'O'), (u'Trek', u'O')]
[(u'Star', u'PERSON'), (u'Jones', u'PERSON')]
[(u'Star', u'O'), (u'Wars', u'O')]
I also tried removing Star Wars
from the list, and again the results change ('Trek' becomes Person, and 'Star' becomes O).
I looked into nltk/tag/stanford.py
and it's not really clear why this would happen. I was hoping someone could lend a hand in identifying what the issue might be, or at least confirm I'm not the only one seeing this.
nltk version 3.2.5 python version 2.7.13
Ok so it has to do with whether or not you use this NLs tokenization. If you leave it as false, it will treat the input as one giant string, which means the predicted tags are now dependent on everything in the string. In my view, this is wrong. Changing it to 'true' and removing the quotes gives me the desired output.
To be extra clear, modify: '\"tokenizeNLs=false\"'
--> 'tokenizeNLs=true'
Author:qs
link:http://www.pythonblackhole.com/blog/article/246851/fd47096ee66269c30e83/
source:python black hole net
Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.
name:
Comment content: (supports up to 255 characters)
Copyright © 2018-2021 python black hole network All Rights Reserved All rights reserved, and all rights reserved.京ICP备18063182号-7
For complaints and reports, and advertising cooperation, please contact vgs_info@163.com or QQ3083709327
Disclaimer: All articles on the website are uploaded by users and are only for readers' learning and communication use, and commercial use is prohibited. If the article involves pornography, reactionary, infringement and other illegal information, please report it to us and we will delete it immediately after verification!