知識グラフ

knowledge_extraction_prompt="
You are a networked intelligence helping a human track knowledge triples about all relevant people, things, concepts, etc. and integrating them with your knowledge stored within your weights as well as that stored in a knowledge graph.
Extract all of the knowledge triples from the last line of conversation. A knowledge triple is a clause that contains a subject, a predicate, and an object.
The subject is the entity being described, the predicate is the property of the subject that is being described, and the object is the value of the property.

EXAMPLE
Conversation history:
Person #1: Did you hear aliens landed in Area 51?
AI: No, I didn't hear that. What do you know about Area 51?
Person #1: It's a secret military base in Nevada.
AI: What do you know about Nevada?\nLast line of conversation:
Person #1: It's a state in the US. It's also the number 1 producer of gold in the US.
Output: (Nevada, is a, state)<|>(Nevada, is in, US)<|>(Nevada, is the number 1 producer of, gold)
END OF EXAMPLE

EXAMPLE
Conversation history:
Person #1: Hello.\nAI: Hi! How are you?
Person #1: I'm good. How are you?
AI: I'm good too.\nLast line of conversation:
Person #1: I'm going to the store.

Output: NONE
END OF EXAMPLE

EXAMPLE
Conversation history:
Person #1: What do you know about Descartes?
AI: Descartes was a French philosopher, mathematician, and scientist who lived in the 17th century.
Person #1: The Descartes I'm referring to is a standup comedian and interior designer from Montreal.
AI: Oh yes, He is a comedian and an interior designer. He has been in the industry for 30 years. His favorite food is baked bean pie.
Last line of conversation:
Person #1: Oh huh. I know Descartes likes to drive antique scooters and play the mandolin.

Output: (Descartes, likes to drive, antique scooters)<|>(Descartes, plays, mandolin)\nEND OF EXAMPLE\n\nConversation history (for reference only):

{history}
Last line of conversation (for extraction):
Human: {input}

Output:

知識グラフを作るための方法

  1. データ入力の段階では、入力されたテキストを Coreference resolution(共参照解決)します。Coreference resolution(共参照解決)はテキスト内で同じものを指している言葉を全て見つけ出してつなげる作業です。例えば、「彼」や「そのお店」といった代名詞が何を指しているのかを特定するわけですね。
  2. 次に、Named entity recognition(名前付きエンティティ認識)の段階に移ります。ここでは、テキストに出てくる人名や組織名、場所などの固有名詞を全て見つけ出します。図の例だと、Tomaz、Blog、Diagramという3つのエンティティが登場しています。
  3. Disambiguation(エンティティ曖昧性解消)がその次に行います。同じ名前や似たような言及があるエンティティを、きちんと区別して認識するんですね。 これやらないと、同じだけで違う名前でノードが作られたりするのでめっちゃ重要です。
  4. 最後は、Relation extraction(関係抽出)のステップです。ここでは、見つけ出したエンティティ同士の関係性を明らかにします。例えば、図ではTomazとBlogの間にはLIKES(好む)という関係があるのかもしれません。ちなみにここもDisambiguation(エンティティ曖昧性解消)を行うべきです。

参考文献