I can't comment specifically on hbase, But I have implemented RDF storage for Cassandra which has a very similar bigtable-converted Red Data Model. You basically have two options in how to store RDF data in wide-column databases like hbase and Cassandra: The resource-centric approach and the statement-centric approach. In the statement-oriented approach, each RDF statement corresponds to a row key (for instance, a UUID) and containssubject,predicateAndobjectColumns. In Cassandra, each of these wowould be supercolumns that wowould then contain subcolumns suchtypeAndvalue, To differentiate between RDF literals, blank nodes and URIs. If you needed to support named graphs, each row cocould also havecontextColumn that woshould contain a list of the named graphs that the statement was part. The above is a relatively simple mapping to implement but suffers from some problems, notably the fact that preventing the creation of duplicate statements (an important RDF semantic) means having to do a read before doing a write, which at least in Cassandra quickly becomes a performance bottleneck as writes are much faster than reads. There are ways to work around und this problem, in particle by using content-addressable statement Identifiers (e.g. the SHA-1 of the canonicalized N-triples representation of each statement) as the row keys, but this in turn introduces other trade-offs such as no longer being able to version Statement data: every time Statement data changes, the old row needs to be deleted and a new one inserted with the new row key. In view of the previous considerations, the resource-oriented approach is generally a better natural fit for Storing RDF data in wide-column databases. in this approach, each RDF subject/resource corresponds to a row key, and each RDF predicate/property corresponds to a column or supercolumn. keyspaces can be used to represent RDF repositories, and column families can be used to represent named graphs. The main trade-off with the resource-based approach is that some statement-oriented operations become more complex and/or slower, notably those counting the total number of statements or querying for predicate or object terms without specifying a subject. to support matching T basic graph pattern matching, additionalPOS,OPS, Etc. indices may need to be created and maintained. See RDF: Cassandra, my Cassandra storage adapter for RDF. RB, for a more detailed example of a resource-centric mapping from the RDF data model to a wide-column data model.
| Link | flag |
Edited Apr 25 at 19: 21 |
Answered Apr 25 at 19: 13 Arto bendiken 202 ● 2 ● 5 |
|