TY - JOUR
T1 - Semantic similarity between ontologies at different scales
AU - Zhang, Qingpeng
AU - Haglin, David
PY - 2016/4/10
Y1 - 2016/4/10
N2 - In the past decade, existing and new knowledge and datasets have been encoded in different ontologies for semantic web and biomedical research. The size of ontologies is often very large in terms of number of concepts and relationships, which makes the analysis of ontologies and the represented knowledge graph computational and time consuming. As the ontologies of various semantic web and biomedical applications usually show explicit hierarchical structures, it is interesting to explore the trade-offs between ontological scales and preservation/precision of results when we analyze ontologies. This paper presents the first effort of examining the capability of this idea via studying the relationship between scaling biomedical ontologies at different levels and the semantic similarity values. We evaluate the semantic similarity between three gene ontology slims (plant, yeast, and candida, among which the latter two belong to the same kingdom-fungi) using four popular measures commonly applied to biomedical ontologies (Resnik, Lin, Jiang-Conrath, and SimRel). The results of this study demonstrate that with proper selection of scaling levels and similarity measures, we can significantly reduce the size of ontologies without losing substantial detail. In particular, the performances of Jiang-Conrath and Lin are more reliable and stable than that of the other two in this experiment, as proven by 1) consistently showing that yeast and candida are more similar (as compared to plant) at different scales, and 2) small deviations of the similarity values after excluding a majority of nodes from several lower scales. This study provides a deeper understanding of the application of semantic similarity to biomedical ontologies, and shed light on how to choose appropriate semantic similarity measures for biomedical engineering.
AB - In the past decade, existing and new knowledge and datasets have been encoded in different ontologies for semantic web and biomedical research. The size of ontologies is often very large in terms of number of concepts and relationships, which makes the analysis of ontologies and the represented knowledge graph computational and time consuming. As the ontologies of various semantic web and biomedical applications usually show explicit hierarchical structures, it is interesting to explore the trade-offs between ontological scales and preservation/precision of results when we analyze ontologies. This paper presents the first effort of examining the capability of this idea via studying the relationship between scaling biomedical ontologies at different levels and the semantic similarity values. We evaluate the semantic similarity between three gene ontology slims (plant, yeast, and candida, among which the latter two belong to the same kingdom-fungi) using four popular measures commonly applied to biomedical ontologies (Resnik, Lin, Jiang-Conrath, and SimRel). The results of this study demonstrate that with proper selection of scaling levels and similarity measures, we can significantly reduce the size of ontologies without losing substantial detail. In particular, the performances of Jiang-Conrath and Lin are more reliable and stable than that of the other two in this experiment, as proven by 1) consistently showing that yeast and candida are more similar (as compared to plant) at different scales, and 2) small deviations of the similarity values after excluding a majority of nodes from several lower scales. This study provides a deeper understanding of the application of semantic similarity to biomedical ontologies, and shed light on how to choose appropriate semantic similarity measures for biomedical engineering.
KW - biomedical informatics
KW - computational biology
KW - knowledge representation
KW - Semantic web
UR - http://www.scopus.com/inward/record.url?scp=84964635527&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-84964635527&origin=recordpage
U2 - 10.1109/JAS.2016.7451100
DO - 10.1109/JAS.2016.7451100
M3 - RGC 22 - Publication in policy or professional journal
SN - 2329-9266
VL - 3
SP - 132
EP - 140
JO - IEEE/CAA Journal of Automatica Sinica
JF - IEEE/CAA Journal of Automatica Sinica
IS - 2
M1 - 7451100
ER -