16.17. RDF Performance Tuning
For RDF query performance, we have the following possible questions:
-
Is the Virtuoso process properly configured to handle big data sets?
-
Is the graph always specified?
-
Are public web service endpoints protected against bad queries?
-
Are there patterns where only a predicate is given?
-
Is there a bad query plan because of cost model error?
16.17.1. General
When running with large data sets, one should configure the Virtuoso process to use between 2/3 to 3/5 of system RAM and to stripe storage on all available disks. See NumberOfBuffers , MaxDirtyBuffers , and Striping INI file parameters.
; default installation NumberOfBuffers = 2000 MaxDirtyBuffers = 1200
Typical sizes for the NumberOfBuffers and MaxDirtyBuffers (3/4 of NumberOfBuffers) parameters in the Virtuoso configuration file (virtuoso.ini) for various memory sizes are as follows, with each buffer consisting of 8K bytes:
Table 16.19. recommended NumberOfBUffers and MaxDirtyBuffers
System RAM | NumberOfBuffers | MaxDirtyBuffers |
---|---|---|
2 GB | 170000 | 130000 |
4 GB | 340000 | 250000 |
8 GB | 680000 | 500000 |
16 GB | 1360000 | 1000000 |
32 GB | 2720000 | 2000000 |
48 GB | 4000000 | 3000000 |
64 GB | 5450000 | 4000000 |
Also, if running with a large database, setting MaxCheckpointRemap to 1/4th of the database size is recommended. This is in pages, 8K per page.