The upcoming Stardog 2.1 query scalability improves by about 3 orders of magnitude and can handle 50 billion triple on a 10,000-dollar server. We have never focused too much on stardog scalability itself: we first consider its ease of use and then consider its speed. We just assumed it would make it extremely scalable. Stardog 2.1 makes querying, data loading, and scalability a huge leap forward.
Running Stardog 2.1 On a 10,000 dollar server hardware (32 cores, 256 GB of RAM) can handle 200 to 50 billion triple. It is about twice times faster than the Stardog 2.0.x,stardog 2.1 loaded 100 million (or so) triple data sets, loading approximately 3 times times faster than about 1 billion triple datasets-while consuming less memory. The 2.1 version can load a 20B dataset at a speed of 300,000 triple per second. We have also greatly improved the performance of query evaluations, enabling Stardog 2.1 to run at high speeds even in larger databases. How did we do that?
Enhanced concurrency
Performance improvements are mostly due to consideration of concurrency and reduced thread contention, especially in loading bulk data (with a large amount of data being added as the initial database is created). We avoid more locks, use more non-blocking algorithms, and more data structures. For example, the transition from Bitset to Concurrentlinkedqueue, although the former has more space efficiency than the latter, we prefer to use threadlocals to reduce thread contention and avoid synchronization. When loading performance improves, multiple LRU caches become indeterminate because the recycle process is filled with attachments. Batch recycling works in a single thread, but it also increases memory pressure and GC time.
Bad hashing algorithm is better
Stardog the URI, Bnodes, and literal values to be stored in the dictionary map, where we used 64-bit murmurhash, because it was very fast and the conflict rate was low, allowing us to store the value as a long shaping. Disk access is required when handling conflicts and cache deletions; Access to these random disks on a certain scale is too expensive. Moving to SHA1 may not be intuitive because the hash value increases from 64 bits to 160 bits. But because hash conflicts can be avoided to a large extent, we can achieve significant acceleration-which also significantly simplifies the dictionary mapping.
Off-heap memory management
Prior to 2.1, we used mmap actively when loading in order to use the operating system's virtual machines, memory management, and so on. But the crappy memory-mapped files in the JVM (unmap!) are notorious, and when we have a lot of memory-mapped files around it, the JVM crashes. Not really.
We also recognize that memory-mapped files can cause performance degradation when there is more than 64GB of free RAM. And once we lose control, the problem with memory-mapped files being flushed to disk is too common. However, it is obviously not feasible to preserve this information on a large scale on the Java heap because of the GC processing mechanism. Stardog 2.1 shifted away from the heap memory allocation scheme, where we were able to control disk refreshes very finely and use free system memory more efficiently, almost as fast as memory mappings.
Reduce GC pauses
Finally, to significantly improve load performance, we have to solve the problem of GC overhead, which is important in large data load processes because objects are created and destroyed frequently. Using immutable objects enables it to improve concurrency, but it is accompanied by a system overhead that produces garbage. The use of GC mechanisms did not have a significant effect. We have resolved the cost of GC processing by modifying where we do not have to create objects. That kind of fine software engineering is carried out with the support of a relatively small stardog code base. Similar to mechanical manufacturing!
Fine software reengineering projects, such as in-depth research, use a constant reset of the RDF parser nature of a single StringBuilder, rather than generating a new creator for each RDF value. We also reduced the use of caching on the heap to reduce memory pressure. The GC pause we're seeing now uses only 1% or less overall bulk load time.
Query evaluation
If you do not improve the performance of the minimum query evaluation, it will not do much to improve the bulk of data loading. So we do the same. The key to improving query evaluation performance in 2.1 is memory usage and how to handle intermediate results. Think about the SPARQL query from SP2B:
SELECT DISTINCT name1 name2 WHERE {article1 rdf:type bench:article.? article2 DC: Creator Author1. Author1 Foaf:name name1. Article2 Dc:creator Author2. Author2 Foaf:name name2. Article1 swrc:journal Journal. Article2 swrc:journal journal FILTER (? Name1 < name2)}
For a 5 million triple database, this query will generate 18,362,955 (!) Results, DISINCT will put them in memory. In the case of large-scale use is not feasible. Stardog 2.1 solves this problem by reusing a new off heap memory allocation plan. A small heap cannot work when the GC process ends it and replaces it with a large heap, so we try to avoid the heap. Of course, this means that we have to manage memory manually, but the application in the JVM is indeed successful. We used a memory manager based on a variety of JVM components to allocate (and unassign) queries for the data during the evaluation, including retaining intermediate query results. It manages the heap outside the JVM and ill-concealed it to disk. The new memory manager also performs some database statistics used by static analysis to guide its query operations.
More details
Improvements to new scalability do not affect public APIs. To learn about other improvements in Stardog 2.1, see this slide.