9.35.2. Parallel Insert With File Tables and Transactions

A file table is copied into a database resident table with an INSERT... SELECT statement. Such a statement executes in parallel if the session is in auto commit mode, i.e. log_enable (2) or log_enable (3) has been previously executed on the session.

A file can be loaded inside a transaction if the connection is not in auto commit log_enable (2) or 3. This will be multithreaded if enable_mt_txn is 1 in the [Flags] section of the ini file or __dbf_set ('enable_mt_txn', 1) has been executed previously. The setting is global. Defaults vary according to server version. Use sys_stat ('enable_nt_txn') to read the value of the setting.

For long files the transaction is liable to run out of rollback space. File table operations as such do not affect the transaction context. Explicit commits may be interspersed in a select statement from a file or other tables.

For example, the history keeping dimension updates from TPC DS can be implemented as follows. The item table is a history keeping dimension that has an index on the i_item_id business key and has a primary key of item surrogate key, with a new value for each version of the item record. The item record has a start and end date (i_rec_start_date, i_rec_end_date) to mark the period of validity of the information. A null value in item_rec_end_date marks the currently applicable record. When the item data is updated, the, a new item is inserted and the previously current item record has its end date set to the current date. These operations must occur atomically. Otherwise the implementation may choose whether to update many item records in the same transaction.

In the below listing most columns have been left out for brevity:

CREATE PROCEDURE item_update (in i_id varchar,...)
{
  vectored;

  UPDATE item
     SET i_rec_end_date = curdate
   WHERE i_id = i_id
     AND i_rec_end_date IS NULL;

  INSERT INTO item (i_ietm_sk, i_item_id, i_rec_end_date,...)
            VALUES (sequence_next ('item_sk_seq'), i_id, NULL,... )
  not vectored { COMMIT WORK};
}

SELECT COUNT (item_update (i_item_id,....)
  FROM item_f....;

The select statement call the item_update procedure on a vector of item ids and other columns. The procedure marks the expired record and inserts the new record, assigning new surrogate keys from a sequence. After each batch it performs one commit. The next batch of items are updated in a separate transaction.

This should be run on a single thread. In a multithreaded transaction the threads may not issue individual commits. The code could be multithreaded by leaving out the commit from the stored procedure. Then the commit would have to be after the completion of the select statement.

The following isql script bulk loads a whole TPC-H database. We leave out the create tables and ft_set_files for brevity, they are all as in the part example above.

log_enable (2); INSERT INTO lineitem SELECT * FROM lineitem_f &
log_enable (2); INSERT INTO orders   SELECT * FROM orders_f &
log_enable (2); INSERT INTO customer SELECT * FROM customer_f &
log_enable (2); INSERT INTO part     SELECT * FROM part_f &
log_enable (2); INSERT INTO partsupp SELECT * FROM partsupp_f &
log_enable (2); INSERT INTO supplier SELECT * FROM supplier_f &
log_enable (2); INSERT INTO nation   SELECT * FROM nation_f &
log_enable (2); INSERT INTO region   SELECT * FROM region_f &

wait_for_children;
checkpoint;

A multithreaded, non-logged, non-transactional insert is started for each table-file pair as a background task. The wait_for_children isql command waits for all the background tasks to complete. The checkpoint statement makes the state durable. Killing the server in before the checkpoint would result in the server starting in a state with none of the effects of the bulk load present, since the log_enable(2) turns off logging. The database is online during the bulk load and the progress may be followed by periodically counting the tables, for example.

Prefix	IRI
schema	http://schema.org/
n4	http://creativecommons.org/licenses/by/4.0/
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
n5	http://www.openlinksw.com/#
xsdh	http://www.w3.org/2001/XMLSchema#
n2	http://docs.openlinksw.com/virtuoso/sqlbulkloadoperationsftableparallel/

Prefix	URI
xmlns:schema	http://schema.org/
xmlns:n4	http://creativecommons.org/licenses/by/4.0/
xmlns:rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:n5	http://www.openlinksw.com/#
xmlns:xsdh	http://www.w3.org/2001/XMLSchema#
xmlns:n2	http://docs.openlinksw.com/virtuoso/sqlbulkloadoperationsftableparallel/

Prefix

URI

xmlns:schema

http://schema.org/

xmlns:n4

http://creativecommons.org/licenses/by/4.0/

xmlns:rdf

http://www.w3.org/1999/02/22-rdf-syntax-ns#

xmlns:n5

http://www.openlinksw.com/#

xmlns:xsdh

http://www.w3.org/2001/XMLSchema#

xmlns:n2

http://docs.openlinksw.com/virtuoso/sqlbulkloadoperationsftableparallel/

Subject	Predicate	Object
n2:	rdf:type	schema:TechArticle
n2:	rdf:type	schema:APIReference
n2:	schema:name	9.35.2.ÃÂ Parallel Insert With File Tables and Transactions
n2:	schema:copyrightHolder	_:vb81732
n2:	schema:datePublished	2016-09-09 16:16:54
n2:	schema:headline	9.35.2.ÃÂ Parallel Insert With File Tables and Transactions
n2:	schema:keywords	OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data
n2:	schema:license	n4:deed.en_US
n2:	schema:publisher	_:vb81731
n2:	schema:url	n2:
_:vb81731	rdf:type	schema:Organization
_:vb81731	schema:name	OpenLink Software
_:vb81731	schema:url	n5:this
_:vb81732	rdf:type	schema:Organization
_:vb81732	schema:name	OpenLink Software
_:vb81732	schema:url	n5:this

Subject

Predicate

Object

n2:

rdf:type

schema:TechArticle

n2:

rdf:type

schema:APIReference

n2:

schema:name

9.35.2.ÃÂ Parallel Insert With File Tables and Transactions

n2:

schema:copyrightHolder

_:vb81732

n2:

schema:datePublished

2016-09-09 16:16:54

n2:

schema:headline

9.35.2.ÃÂ Parallel Insert With File Tables and Transactions

n2:

schema:keywords

OpenLink,Virtuoso,database,RDBMS,relational,SQL,RDF,triple store,linked data,linked open data,Big Data

n2:

schema:license

n4:deed.en_US

n2:

schema:publisher

_:vb81731

n2:

schema:url

n2:

_:vb81731

rdf:type

schema:Organization

_:vb81731

schema:name

OpenLink Software

_:vb81731

schema:url

n5:this

_:vb81732

rdf:type

schema:Organization

_:vb81732

schema:name

OpenLink Software

_:vb81732

schema:url

n5:this

Prev	Up	Next
9.35. SQL Bulk Load, ELT, File Tables and Zero Load Operations	Home	Chapter 10. Virtuoso Cluster Programming

9.35.2. Parallel Insert With File Tables and Transactions

Namespace Prefixes

Statements

Namespace Prefixes

Statements