The RDF data model forms a cornerstone of the Semantic Web technology stack. Although there have been different proposals for RDF serialization syntaxes, the underlying simple data model enables great flexibility which allows it to be successfully employed in many different scenarios and to form the basis on which other technologies are developed. In order to apply an RDF-based approach in practice it is necessary to communicate the structure of the data that is being stored or represented. Data quality is of paramount importance for the acceptance of RDF as a data representation language and it must be enabled by the use of tools that can check if some data conforms to some specific structure. There have been several recent proposals for RDF validation languages like ShEx and SHACL. In this chapter, we describe both proposals and enumerate some challenges and trends that we foresee with regards to RDF validation. We devote more space to what we consider one of the main challenges, which is to compare ShEx and SHACL and to understand their underlying foundations. To that end, we propose an intermediate language and show how ShEx and SHACL can be converted to it.
Jose Emilio Labra Gayo, Herminio García-González, Daniel Fernández-Álvarez, Eric Prud’hommeaux
In book: Current Trends in Semantic Web Technologies: Theory and Practice. doi: 10.1007⁄978-3-030-06149-4_6,
Shape Expressions have recently been proposed as a high-level language to intuitively describe and validate the topology of RDF graphs. Current implementations of Shape Expressions are focused on checking which nodes of certain graph ﬁt in which deﬁned schemata, in order to get automatic typings or to improve RDF data quality in terms of completion and consistency. We intend to reverse this process, i.e., we propose to study the neighborhood of graph nodes that have already been typed in order to induce templates in which most of the individuals ﬁt. This will allow to discover latent schemata of existing graphs, which can be used as a guideline for introducing coherent information in existing structures or for quality veriﬁcation purposes. We consider that collaborative or general-purpose graphs are specially interesting domains to apply this idea.
Daniel Fernández-Álvarez, Jose Emilio Labra Gayo
In order to perform any operation in an RDF graph, it is recommendable to know the expected topology of the targeted information. Some technologies and syntaxes have been developed in the last years to describe the expected shapes in an RDF graph, such as ShEx and SHACL. In general, a domain expert can use these syntaxes to define shapes in a graph, with two main purposes: data validation and documentation. However, there are some scenarios in which the schema cannot be predicted a priori, but it emerges at the same time that the graph is filled with new information. In those cases, the shapes are latent in the current content. We have developed a prototype which is able to infer shapes of classes in a knowledge graph and used it with classes of DBpedia ontology. We serialize our results using ShEx.
Daniel Fernández-Álvarez, Herminio García-González, Johannes Frey, Sebastian Hellmann, Jose Emilio Labra Gayo
International Semantic Web Conference 2018 (ISWC2018),
In this paper, the authors describe Musical Entities Reconciliation Architecture (MERA), an architecture designed to link music-related databases adapting the reconciliation techniques to each particular case. MERA includes mechanisms to manage third party sources to improve the results and it makes use of semantic technologies, storing and organizing the information in RDF graphs. They have implemented a prototype of their approach and have used it to link sources with different levels of data quality. The prototype has been effective in more than 94% of the cases under the conditions of their experiments. The authors have also compared their prototype with a well-known music-specialized search engine, outperforming the search results in the two experiments that they performed.
Daniel Fernández-Álvarez, Jose Emilio Labra Gayo, Daniel Gayo-Avello, Patricia Ordoñez de Pablos
International Journal on Semantic Web and Information Systems (IJSWIS), doi: 10.4018/IJSWIS.2017100103,
Data interoperability is currently a problem that we are facing more intensely due to the appearance of fields like Big Data or IoT. Many data is persisted in information silos with neither interconnection nor format homogenisation. Our proposal to alleviate this problem is ShExML, a language based on ShEx that can map and merge heterogeneous data formats into a single RDF representation. We advocate the creation of this type of tools that can facilitate the migration of nonsemantic data to the Semantic Web.
Herminio García-González, Daniel Fernández-Álvarez, Jose Emilio Labra Gayo
CEUR Workshop Proceedings,