21.5. Sample Configuration

This section describes how to set up disks and I/O for a sample run. To begin with, the scaling rule is 12.5 tpmC per warehouse. This means that in order to measure 1000 tpmC you must have a 1000 / 12.5 = 81 warehouses. These take about 100 MB apiece.

The benchmark's working set consists of the STOCK and CUSTOMER tables of each warehouse and of the ITEM table of the database. Other tables are accessed more or less sequentially, i.e. inserts to end or deletes from start. There is a particular distribution of hits for the STOCK and CUSTOMER rows of each warehouse, leading to a specific working set within each.

The 160 day rule requires a disk configuration sufficient for accumulating 160 days worth of transactions at the reported rate. For practical reasons we will ignore this rule here. To just run the benchmark for the required 20 minutes we will need about twice the space of the initial data. Let's assume we have an initial database of 8 GB and have another 16 GB for working space, a total of 24 GB. This is 6 4 GB disks or 12 2 GB ones.

Let's now look at the relationship between CPU and disk usage. The New Order transaction, which mostly defines the benchmark's working set consists of an average of 10 updates to the STOCK table, which mostly cause disk reads, 10 reads of the ITEM table which is always in cache, 10 ascending ORDER_LINE inserts, 1 ascending ORDERS insert and 1 CUSTOMER read plus a DISTRICT update and WAREHOUSE read.

If this happens entirely in RAM this takes about 10 milliseconds on a 200 MHz Pentium Pro, 13 on an Ultra SPARC and so on. Which is roughly as long a one random disk seek takes.

Because the scaling rule limits RAM to cover only 10% to 20% of the working set, the STOCK updates will miss the cache most of the time. This with a transaction mix of New Orders only, we would need about 8 disk seeks to be in progress concurrently in order to feed one CPU. The other transactions are either more local or repeat the New Order access profile. Thus we end up needing about 5 concurrent 10 msec disks for one Power PC 604 and almost double for a 200 MHz Pentium Pro.

For our 24 GB configuration we may as well divide it over as many stripes as we have disks. For 6 4 GB disks, we would have:

[Striping]
Segment1 = 24G,  /disk1/tpcc-1-1.db = q1, /disk2/tpcc-1-2.db = q2,  /disk3/tpcc-1-3.db = q3,  /disk4/tpcc-1-4.db = q4,  /disk5/tpcc-1-5.db = q5,  /disk6/tpcc-1-6.db = q6

Assuming we have file systems. Note the I/O queue names q1...q6, meaning that each stripe gets processed on a separate thread for I/O.

	See
	Disk Configuration for more on this.

Your Virtuoso may or may not support raw devices. If it does, they are specified here.

For the other configuration parameters, we will have sufficient RAM configured for the DBMS, counting 8.5K of RAM for each buffer. For 512 MB of RAM, we would have about 25000 buffers. The maximum dirty parameter is more tricky. A low number causes unnecessary writing and a high value causes the number of clean buffers at any time being too low, causing an imperfect match of the read working set. The read-only set is only the ITEM table, about 20 MB.

	Note:
	Note that about half of the available RAM can efficiently be allocated to the database, allocating all RAM may result in swapping due to the OS's disk caching. In terms of kernel tuning, if one can decrease the OS cache, one may increase the RAM utilizable by the DBMS without the OS's disk cache getting in the way.

The Stock Level transaction reads lines written by recent New Order transactions, which are likely to be in RAM and still dirty. The reads and updates of CUSTOMER are random and generally speaking what is read is also likely to be or have been updated. We could start with a guess of 70% maximum dirty, i.e. a value of 70000 for 100000 buffers.

The checkpoint remap should be as large as possible and the unremap quota should be small. The benchmark does practically no sequential reads and therefore does not care about disk locality. No limit on remapping means that a checkpoint can be made in the time it takes to flush the disk cache. This is done in the background, so that the atomic checkpoint time is limited to the time it takes to write out those buffers that became dirty while the first sweep was in progress.

Thus we could have:

MaxCheckpointRemap = 2000000
UnremapQuota = 3000

Prefix	IRI
schema	http://schema.org/
n4	http://creativecommons.org/licenses/by/4.0/
n2	http://docs.openlinksw.com/virtuoso/sampleconf/
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
n5	http://www.openlinksw.com/#
xsdh	http://www.w3.org/2001/XMLSchema#

Prefix	URI
xmlns:schema	http://schema.org/
xmlns:n4	http://creativecommons.org/licenses/by/4.0/
xmlns:n2	http://docs.openlinksw.com/virtuoso/sampleconf/
xmlns:rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:n5	http://www.openlinksw.com/#
xmlns:xsdh	http://www.w3.org/2001/XMLSchema#

Prefix

URI

xmlns:schema

http://schema.org/

xmlns:n4

http://creativecommons.org/licenses/by/4.0/

xmlns:n2

http://docs.openlinksw.com/virtuoso/sampleconf/

xmlns:rdf

http://www.w3.org/1999/02/22-rdf-syntax-ns#

xmlns:n5

http://www.openlinksw.com/#

xmlns:xsdh

http://www.w3.org/2001/XMLSchema#

Subject	Predicate	Object
n2:	rdf:type	schema:TechArticle
n2:	rdf:type	schema:APIReference
n2:	schema:name	21.5.ÃÂ Sample Configuration
n2:	schema:copyrightHolder	_:vb81590
n2:	schema:datePublished	2016-09-09 16:16:54
n2:	schema:headline	21.5.ÃÂ Sample Configuration
n2:	schema:keywords	OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data
n2:	schema:license	n4:deed.en_US
n2:	schema:publisher	_:vb81589
n2:	schema:url	n2:
_:vb81589	rdf:type	schema:Organization
_:vb81589	schema:name	OpenLink Software
_:vb81589	schema:url	n5:this
_:vb81590	rdf:type	schema:Organization
_:vb81590	schema:name	OpenLink Software
_:vb81590	schema:url	n5:this

Subject

Predicate

Object

n2:

rdf:type

schema:TechArticle

n2:

rdf:type

schema:APIReference

n2:

schema:name

21.5.ÃÂ Sample Configuration

n2:

schema:copyrightHolder

_:vb81590

n2:

schema:datePublished

2016-09-09 16:16:54

n2:

schema:headline

21.5.ÃÂ Sample Configuration

n2:

schema:keywords

OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data

n2:

schema:license

n4:deed.en_US

n2:

schema:publisher

_:vb81589

n2:

schema:url

n2:

_:vb81589

rdf:type

schema:Organization

_:vb81589

schema:name

OpenLink Software

_:vb81589

schema:url

n5:this

_:vb81590

rdf:type

schema:Organization

_:vb81590

schema:name

OpenLink Software

_:vb81590

schema:url

n5:this

Prev	Up	Next
21.4. Omissions, Exceptions from the Definition	Home	21.6. Other Factors

21.5. Sample Configuration

Namespace Prefixes

Statements

Namespace Prefixes

Statements