Tutorial - Local Entity Linking¶
In the previous step, you ran the spacy_ann create_index
CLI command. The output of this
command is a loadable spaCy model with an ann_linker
capable of Entity Linking against your KnowledgeBase data.
You can load the saved model from output_dir
in the previous step just like you would any normal spaCy model.
Load ann_linker
model¶
First load the model created by spacy_ann create_index
import spacy from spacy.tokens import Span if __name__ == "__main__": # Load the spaCy model from the output_dir you used # from the create_index command model_dir = "examples/tutorial/models/ann_linker" nlp = spacy.load(model_dir) # The NER component of the en_core_web_md model doesn't actually # recognize the aliases as entities so we'll add a # spaCy EntityRuler component for now to extract them. ruler = nlp.create_pipe('entity_ruler') patterns = [ {"label": "SKILL", "pattern": alias} for alias in nlp.get_pipe('ann_linker').kb.get_alias_strings() + ['machine learn'] ] ruler.add_patterns(patterns) nlp.add_pipe(ruler, before="ann_linker") doc = nlp("NLP is a subset of machine learn.") print([(e.text, e.label_, e.kb_id_, e._.alias_candidates) for e in doc.ents]) # Outputs: # [('NLP', 'SKILL', 'a3'), ('machine learn', 'SKILL', 'a1')] # # In our entities.jsonl file # a3 => Natural Language Processing # a1 => Machine learning
Load Extraction Model¶
This is a bit of misnomar for the provided example code.
You likely want a trained NER model but the purpose of this example we'll just arbitrarily extract entities using the spaCy EntityRuler
component by just add a few terms to it that are close to those in our KnowledgeBase.
import spacy from spacy.tokens import Span if __name__ == "__main__": # Load the spaCy model from the output_dir you used # from the create_index command model_dir = "examples/tutorial/models/ann_linker" nlp = spacy.load(model_dir) # The NER component of the en_core_web_md model doesn't actually # recognize the aliases as entities so we'll add a # spaCy EntityRuler component for now to extract them. ruler = nlp.create_pipe('entity_ruler') patterns = [ {"label": "SKILL", "pattern": alias} for alias in nlp.get_pipe('ann_linker').kb.get_alias_strings() + ['machine learn'] ] ruler.add_patterns(patterns) nlp.add_pipe(ruler, before="ann_linker") doc = nlp("NLP is a subset of machine learn.") print([(e.text, e.label_, e.kb_id_, e._.alias_candidates) for e in doc.ents]) # Outputs: # [('NLP', 'SKILL', 'a3'), ('machine learn', 'SKILL', 'a1')] # # In our entities.jsonl file # a3 => Natural Language Processing # a1 => Machine learning
Test the trained ann_linker
component¶
Run the pipeline on some sample text and ensure that you have e.kb_id_
set properly for each entity. You should get id a3
for "NLP" and id a1
for "machine learn
import spacy from spacy.tokens import Span if __name__ == "__main__": # Load the spaCy model from the output_dir you used # from the create_index command model_dir = "examples/tutorial/models/ann_linker" nlp = spacy.load(model_dir) # The NER component of the en_core_web_md model doesn't actually # recognize the aliases as entities so we'll add a # spaCy EntityRuler component for now to extract them. ruler = nlp.create_pipe('entity_ruler') patterns = [ {"label": "SKILL", "pattern": alias} for alias in nlp.get_pipe('ann_linker').kb.get_alias_strings() + ['machine learn'] ] ruler.add_patterns(patterns) nlp.add_pipe(ruler, before="ann_linker") doc = nlp("NLP is a subset of machine learn.") print([(e.text, e.label_, e.kb_id_, e._.alias_candidates) for e in doc.ents]) # Outputs: # [('NLP', 'SKILL', 'a3'), ('machine learn', 'SKILL', 'a1')] # # In our entities.jsonl file # a3 => Natural Language Processing # a1 => Machine learning
Next Steps¶
This works great when you can afford to fit your KnowledgeBase in memory and have full access to your KnowledgeBase. In the next step of this tutorial, we'll talk about hosting the KnowledgeBase and ANN Index remotely and making batch calls to the endpoint so you can keep the KnowledgeBase and model code separate.