I can't comment specifically on hbase, But I have implemented RDF storage for Cassandra which has a very similar bigtable-converted Red Data Model. You basically have two options in how to store RDF data in wide-column databases like hbase and Cassandra: The resource-centric approach and the statement-centric approach. In the statement-oriented approach, each RDF statement corresponds to a row key (for instance, a UUID) and containssubject ,predicate Andobject Columns. In Cassandra, each of these wowould be supercolumns that wowould then contain subcolumns suchtype Andvalue , To differentiate between RDF literals, blank nodes and URIs. If you needed to support named graphs, each row cocould also havecontext Column that woshould contain a list of the named graphs that the statement was part. The above is a relatively simple mapping to implement but suffers from some problems, notably the fact that preventing the creation of duplicate statements (an important RDF semantic) means having to do a read before doing a write, which at least in Cassandra quickly becomes a performance bottleneck as writes are much faster than reads. There are ways to work around und this problem, in particle by using content-addressable statement Identifiers (e.g. the SHA-1 of the canonicalized N-triples representation of each statement) as the row keys, but this in turn introduces other trade-offs such as no longer being able to version Statement data: every time Statement data changes, the old row needs to be deleted and a new one inserted with the new row key. In view of the previous considerations, the resource-oriented approach is generally a better natural fit for Storing RDF data in wide-column databases. in this approach, each RDF subject/resource corresponds to a row key, and each RDF predicate/property corresponds to a column or supercolumn. keyspaces can be used to represent RDF repositories, and column families can be used to represent named graphs. The main trade-off with the resource-based approach is that some statement-oriented operations become more complex and/or slower, notably those counting the total number of statements or querying for predicate or object terms without specifying a subject. to support matching T basic graph pattern matching, additionalPOS ,OPS , Etc. indices may need to be created and maintained. See RDF: Cassandra, my Cassandra storage adapter for RDF. RB, for a more detailed example of a resource-centric mapping from the RDF data model to a wide-column data model.
Link | flag |
Edited Apr 25 at 19: 21 |
Answered Apr 25 at 19: 13 Arto bendiken 202 ● 2 ● 5 |
|