# SPARQL CONSTRUCT comparison

I had some days left on a physical machine we used for an EU FP7 research project so I took the chance to compare 3 triplestores (update: added some more based on comments here) I or my colleagues worked with in the past months. I do not want to imply anything with this test, it's just me playing around and having fun with RDF. If you have any comments, add it here.

## Hardware

The test platform comprises a dedicated server, not a virtual machine, with the following specification:

 - 2 x Intel Xeon E5 2620V2, 2 x (6 x 2.10 GHz) (appears as 24 cores in `htop`)
 - 128 GB buffered ECC RAM
 - 1000 GB SSD (Samsung 840 EVO)
 - Ubuntu 14.04

## Dataset

The dataset contains 5 million triples (including some which are not valid RDF as `"NA"` is declared as `xsd:int`). It contains transports between entities and a date. To optimize query execution time for the particular use case, we want to infer/materialize (what's the right word here?) some triples so we don't have to go through all data all the time.

Source: (http://ktk.netlabs.org/misc/bfs/blv.nt) (622MB)

```turtle
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix pobo: <http://purl.obolibrary.org/obo/> .


<http://foodsafety.data.admin.ch/move/0> a schema:TransferAction ;
  schema:fromLocation <http://foodsafety.data.admin.ch/business/50454> ;
  schema:toLocation <http://foodsafety.data.admin.ch/business/50415> ;
  dc:date "2012-01-01"^^xsd:date ;
  pobo:UO_0000189 "1"^^xsd:int .
```

There are around 900'000 `TransferAction` in there. We torture the server with the following CONSTRUCT (well, INSERT) query:

```sparql
PREFIX blv: <http://blv.ch/>
PREFIX schema: <http://schema.org/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

INSERT {
    ?othermove blv:notBefore ?move .
}
WHERE {

    ?move a schema:TransferAction ;
    dc:date ?date ;
    schema:toLocation ?toFarm .

    ?othermove a schema:TransferAction ;
    dc:date ?otherdate ;
    schema:fromLocation ?toFarm .

    FILTER (?date <= ?otherdate)

} 
```

After successful execution, I check how many triples were generated:

```sparql
SELECT  (COUNT(*) AS ?c) WHERE {?s <http://blv.ch/notBefore> ?o}
```

Which should be around 30 million triples.

## Results

Note that I did not do any optimization on the configurations. My idea was to take what vendors ship by default and see how long it takes. Because that's what users usually do ;)

### Virtuoso

* Homepage: http://virtuoso.openlinksw.com/
* Version: Virtuoso version 07.20.3215 on Linux (x86_64-unknown-linux-gnu), Single Server Edition 
* Host: docker, image `tenforce/virtuoso` 
* __Query execution time: 23 minutes__

#### Remarks

Loading RDF was fast, did it with iSQL according to the documentation of the [Docker image](https://hub.docker.com/r/tenforce/virtuoso/). Virtuoso does not seem to use more than one core. During the whole execution time I had 100% load on one of the 24 cores, the rest did nothing.

### Stardog

* Homepage: http://stardog.com/
* Version: 4.0.5, Enterprise license (1 month trial key)
* Host: docker, image `java:latest` as there is no public docker image available.
* Run: Default configuration started with `stardog-admin server start`
* __Query execution time: 4.00 minutes__

#### Remarks

Loading was fast, did it with `stardog data add` on command line. I had the impression there is some query optimization going on. In the beginning there was not too much activity on the different cores. After a while the box became more busy and I saw quite some load on all cores. By far the fastest query execution time.

### Blazegraph

* Homepage: https://www.blazegraph.com/
* Version: 2.1.0
* Host: docker, image `java:latest` as there is no public docker image available.
* Run: `java -server -Xmx8g -jar blazegraph.jar`
* __Query execution time: 33 minutes__

#### Remarks

I first used a docker image but didn't notice that this was the old 1.x version. I ran into a bug while executing the query on a 24 core machine and they asked me to retry with 2.x so make sure you use this as well as all docker images seem to be 1.x. Loading was fast, loaded it in the SPARQL UPDATE web interface from URI. Blazegraph was the most active on all cores, I basically had the whole time quite some load on them. I tried as well with 64GB or memory allocated to the VM but that was apparently not a bottleneck.

### Jena Fuseki

* Homepage: https://jena.apache.org/documentation/serving_data/
* Version: Version 2.0.1-SNAPSHOT
* Host: docker, image `stain/jena-fuseki`
* __Query execution time: TODO  minutes__

#### Remarks

I started the docker image and loaded the data with `tdbloader` into `/fuseki/databases/blv`. After that I created a new database in the web interface which apparently didn't override the TDB store. The loading time is fast. While executing the query there is high load on all cores. 

UPDATE 27.4.2016: I increased -xmx to 8GB and after around 6 hours I ran out of heap space. Not sure if we get anywhere without optimizing it (and I don't really know how).

### Ontotext GraphDB

* Homepage: http://ontotext.com/products/graphdb/
* Version: GraphDB Free 7.0
* Host: docker, image `java:latest` as there is no public docker image available.
* Run: `~/graphdb-free-7.0.0/bin# ./graphdb`
* __Query execution time:  16 minutes__

#### Remarks

I created a new default store configuration, didn't change anything on the default settings regarding cache size etc. Loading via URL, loading was fast. I see load only on one core.

### Ontos OntoQuad

* Homepage: http://www.ontos.com/products/ontoquad/
* Version: 0.6.0
* Host: docker, built from `Dockerfile` found in `ontoquad-docker.txz` 
* __Query execution time: 31 minutes (default config, polymorphic2)__
* __Query execution time: 14 minutes (polymorphic2, no transaction)__


#### Remarks

After consulting the documentation in [Confluence](https://ontos4dds.atlassian.net/wiki/display/QUAD/Database+Administration) I managed to upload the file as Triples which I copied into the docker image. Loading is fast. Default query execution timeout was too low, I could change it in the webinterface but I think it never got stored for some reason. So I changed it in the config file itself before I built the docker image. Same problem with transactions, disabled it in the config for the second round.