11.12.Asynchronous Execution and Multithreading in Virtuoso/PL

Many application tasks benefit from parallel execution. This is specially true of I/O intensive workloads where each thread spends a large amount of time waiting for the network or disks. Typical tasks include crawling the web and importing large data sets. The whole process must not stop just because there is a file cache miss or because there is round trip latency or a name resolution delay on the net.

To this effect, Virtuoso/PL provides the async_queue object. A stored procedure may create an async_queue that will be served by a pool of worker threads. The size of this pool can be set when creating the queue.

The thread which made the queue can use the queue to pass procedure/parameter list pairs to the threads. If a thread is available, the thread will execute the request, if not, the next thread of the pool to become free will take the oldest queued item and execute it. Thus the queue is served in a FIFO fashion multiplexed over n threads.

The owner of the queue can check on the results of execution either collectively or individually. Each worker thread has its own transaction and worker threads may end up waiting for each other own database locks and can deadlock. The worker thread code is responsible for committing its own transaction and handling any deadlock retries or such.

When the thread owning the queue makes a request, a request number is returned. This number can be used to later request the return value and error status of the request. A queue cannot be persisted in a database table and cannot be passed between threads. A queue can be passed between procedures and kept in data structures such as arrays. Queue are internally reference counted and when the last reference drops the queue is freed. If a queue is freed while there is still activity on behalf of the queue, the activities that are ongoing are finished, all requests that are not started are discarded and all values and error states are discarded. The queue and associated resources are thereafter freed.

There is a pool of 20 threads that are shared among all async_queues on the system. Thus, the count of threads given for the queue is a maximum and does not guarantee that this quantity of threads be used in reality. If no thread is available in the pool, i.e. other async queues have exhausted the entire pool, the thread making the request ends up executing the item synchronously. One should take this possibility into account when deciding transaction boundaries, otherwise this is transparent and the calling thread still gets a request handle and can later check for its completion.

Queues take procedures and argument lists instead of texts of SQL statements in order to save the time of compiling the text. It is desirable for best performance to supply the name of the procedure in its full form, with full qualification and matching case.

Consider the following code samples:

create table aqi (n int);

-- The worker procedure. Insert one row and commit.

create procedure INS1 (in n int)
  --dbg_obj_print ('ins1 ', n);
  insert into AQI (N) values (n);
  commit work;
  return '22';

create procedure taq1 (in x int, in thrs int := 1)
  declare aq, res, err any;
  declare n int;
  aq := async_queue (thrs);
  for (n:= 0; n < x; n:=n+1)
      res := aq_request (aq, 'DB.DBA.INS1', vector (n));
  return (aq_wait (aq, res, 1, err));

-- This procedure makes a queue with a given number of worker threads, then makes a set of requests and waits for the result
of the last one. Note that this is not necessarily the last to complete if there are multiple threads serving the queue.