Knowledge Engineering Ⅶ | KG Alignment
同一实体在不同的database下可能有不同的表示。
ontology matching(schema matching)本体匹配
element-level matching
String-based:
- 字符串前半部分或后半部分相同
- Levenshtein distance e.g. sim(NKN, Nikon)=2/5=0.4
- N-gram, (此处分母取长单词所分割成的个数)
Language-based:
- 将带连接符的单词分开
- 将过去式、复数等化为原型
- 去掉冠词、介词等
Resource-based:
- WordNet A=B(同义词), A⊥B (A,B 是反义词或具有层级关系)
Constraint-based:
- datatype intenger<real
- [1, 1]<[0, 10]
structure-level matching
Graph-based techniques
如果图中的两个节点很相似,那么他们周围的节点也会很相似。基于语义信息,相同类的解释相同。
taxonomy-based techniques
比较类的实例集来决定这些类是否匹配
一些相似度:(为两个本体匹配类别的时候需要计算两个类别的相似度)
*一般都不对称
element-level:
String Similarity:
LCS: the longest common substring
l(•) returns the label of the class
Textual Content Similarity:
在网页上搜索某个类的标签名字,我们将搜索后出现在标题下面的小字snippet作为textual content。将该段文字中每个单词出现的频率进行计算,去除一些不必要的和频率低的单词。根据w=tf*idf对剩余单词进行权值计算。得到一组权值TC(C)=[w1(C),w2(C),…,wn(C)]
structure-level
Neighbor Class Set Similarity:
NCS(•) returns the set of first-order neighbor classes(只找类,不看实例)
Instance Set Similarity
IS(•) returns the instance set of the class (找共同实例数目)
instance matching 实例匹配
finding different instances of the same real-world objects.
Knowledge Graph Embedding aims to map entities and relations to continuous vector representation
TransE: monolingual scenario(单种语言)
MTransE: mlitilingual scenario
Axis calibration + Linear Transformations + Translation vectors