Knowledge Engineering Ⅵ | KG Construction from Unstructured data
本文我们不深入讲解 Entity linking、Relation Extraction、Event Extraction,这些在NLP中都已讲过。我们重点探究 General is-a Relation Extraction 和 Terminology/Term Extraction 这两个task。
General is-a Relation Extraction
is-a relation is the semantic relationship between a more specific word (hyponym 上位词) and the more general term (hypernym 下位词).
性质:传递性
Pattern-based Methods:
ID | Pattern | ||
---|---|---|---|
1 | NP such as {NP,} *{(or\ | and)}NP | |
2 | NP{,}(including\ | esprcially){NP,} *{(or\ | and)}NP |
3 | NP {,NP}*{,}(and\ | or) other NP |
Terminology/Term Extraction
Statistical-based approaches
Termhood: Measure the relevance between term and domain.
TF-IDF (计算某单词在文件中的重要性)
TF (term frequency) = Number of certain word in a document / Number of all words in a document
IDF (inverse document frequency) = log(Number of all document in corpu / Number of documents containing the certain word +1) 体现词汇的独特性,减少”a”/“an”/“the”的影响。
TF-IDF=TF*IDF
Unithood: Measure the correlation between two variables x and y.
MI (Mutual information)
描述了Y被确定之后X的不确定性大小。
PMI
All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.