14.1.2.Virtual Directories

A Virtuoso virtual directory maps logical paths to physical resource locations accompanied by rules and/or parameters that govern how the mappings respond to user-agent (e.g. Web browser) requests. This mechanism allows physical locations to be obscured or simply reorganized. Some resource types require authentication challenges, such as the Visual Server Administration Interface, and/or special headers such as SOAP, which is another HTTP endpoint.

Virtual directories are useful when one server has to provide access to several Web sites. Redirects are not a universal solution to this, it is far better to define virtual directories that point to the other sites. Suppose that we have two companies, "a" and "b", that are to share a Virtuoso server but want to represented on the Web by www.a.com and www.b.com respectively. Their pages could be stored in directories "/a" and "/b" on the server, whilst virtual directories map requests appropriately:

  http://www.a.com/  -->

 /a
  http://www.b.com/  -->

 /b

Hence, user-agent requests for www.a.com receive pages from /a, and likewise for "b". Requests under these domains are mapped back to their physical location such as the request for the URI http://www.a.com/images/picture.jpg retrieves the file /a/images/picture.jpg .

Virtual directory definitions are held within the system table DB.DBA.HTTP_PATH. Virtual directories can be administered in three basic ways:

Using the Visual Administration Interface via a Web browser.
Using the functions vhost_define() and vhost_remove() .
Updating the system table directory using SQL statements.

Virtuoso matches user-agent requests against a logical path using the longest entry that matches the path extracted from the URI. Suppose we have two entries '/a/b ' and '/a ' and a request is made of: 'http://foo.bar/a/b/c.html ', will match the entry for '/a/b '.

First, Virtuoso will attempt to locate the physical path that has been mapped to a virtual host, interface and logical path. The virtual host corresponds to the 'Host' header field value from HTTP/1.1 requests. If the first step does not succeed then the server will try resolving the interface and logical path. Failing that, the default step will attempt to resolve the path directly to a physical location.

Figure14.2.HTTP Virtual Directory Matching

HTTP Virtual Directory Matching

[Note] Note:

HTTP 1.0 does not use the HOST header. Virtuoso will have little choice but to send HTTP 1.0 user-agents the contents of the default virtual host definition for the interface.

Thus if the following mappings are in effect:

/       ->  /DAV
/doc    ->  http://docs.biz.com:/
/admin  ->  /admin

The following translations would be made:

/doc/howto/intro.html      -> http://docs.biz.com:/howto/intro.html
/admin/help.vsp            -> /admin/help.vsp
/gizmo/doc.xml             -> /DAV/gizmo/doc.xml

Thus, the longest match is selected and the matching substring is replaced by the right hand side of the mapping. Note that this is also how automatic proxying takes place, since a physical path beginning with http:// will be passed forward to a remote server.

Default Pages And Directory Browsing

For each virtual host or logical path pair we can define a list of default pages. If the requested URL path is a directory then the server checks the default page definition for that virtual directory, if a default page exists then the path will be internally expanded to include its name, ands its contents returned.

Example14.1.Default Page

if we have a mapping for the host:

www.a.com

with the logical path mapping of:

'/' mapped to '/a'

with default page 'index.htm', then if the URL

http://www.a.com/

is requested the server will try to send the content of '/a/index.htm'.


The same mechanism is used to determine whether a directory listing is to be returned. If a mapping is defined to have 'Browseable' set to a number greater than zero then the server, if a default page does not exist or is not defined, a directory listing will be returned to the calling client.

Virtual Hosting and Multi Hosting

The term Virtual Host refers to the practice of maintaining more than one server on one machine, differentiated by their apparent host name. It is often desirable for companies sharing a web server to have their own domains, with web servers accessible as www.company1.com and www.company2.com, without requiring the user to know any extra path information. The Virtual host can be IP-based or non-IP. The IP-based (Multi Hosting) refers to practice of having one machine listen for incoming requests on different network interfaces and respond with different pages. The non-IP-based (Virtual Hosting) refers to the practice of one machine having many DNS aliases, and requests from client to a specific alias returning a different response regarding content of 'Host' HTTP header field. Virtuoso supports IP-based, virtual IP-based, and name-based virtual hosts.

For distinct IP-based, hosts are used to determine on which interfaces Virtuoso will listen and accept HTTP requests.

Managing Host Metadata

To add metadata in /.well-known/host-meta, execute:

WS.WS.host_meta_add ([app-name], [xrd-xml-fragment])

For example:

WS.WS.host_meta_add
  (
    'dbpedia.page-descriptor',
    '<Link rel="http://dbpedia.org/resource-descriptor" template="http://dbpedia.org/page/{uri}"/>'
  )
  ;

Virtuoso As A Proxy

The Virtuoso HTTP server can act as a proxy server on the same port as the HTTP port. You can put the host and port that the Virtuoso HTTP server is listening on, into your browser proxy settings and all requests will be processed by it. Also this can be used to retrieve a page inside VSP.

The physical path setting of a virtual directory definition can be URL to another HTTP server. In which case Virtuoso will act as a proxy to that site when the logical path for it is requested.

The nature of Virtuoso's Web Proxying ability makes it easy and seamless to bind multiple websites under one roof. Existing sites do not have to move or change to be integrated under the Virtuoso Proxy. Simply map them under a logical path name. They can be mapped multiple times or from multiple ports.

If you already have pages written and working from other servers via ASP or PHP, then you will be able to run these servers concurrently with Virtuoso so they can share form data and give dynamic content from various sources, consistent with our value proposition of maximum incorporation of new technologies with minimum disruption to existing infrastructure. Whether these servers were hosted on various machines or the same machine there is no need to expose their running ports and services. This makes the end user experience cleaner, and helps maintain some server security and/or anonymity.

[Note] Note:

Virtuoso provides runtime hosting capabilities and PHP support, therefore ASP.Net and PHP and other applications can be run and hosted directly within the file system or WebDAV.

Suppose that you have two machines running existing web servers that serve various parts of your intranet. One web server may have been constructed for or by your sales department while the other server may have been a built by the support department. These servers could be resolved by http://sales.mycompany.com/ and http://support.mycompany.com/ respectively.

You can place Virtuoso on another server and start integrating your existing sites under this installation. You may use the Visual Server Administration Interface or choose to use the following commands via the isql interface:

DB.DBA.VHOST_DEFINE(lpath=>'/sales', ppath=>'http://sales.mycompany.com/');
DB.DBA.VHOST_DEFINE(lpath=>'/support', ppath=>'http://support.mycompany.com/');

This way your old servers will exist under /sales/ and /support/ of your new server. Now you can start adding virtuoso .vsp pages to your new Virtuoso server and they operate interleaved with your existing pages to add new life and functionality as required.

You may decide that you want to install Virtuoso onto a server where a web server already exists. If you plan to use Virtuoso as your default web server and the proxy to your existing server then you will need to make sure that the servers run on different ports. The default port is 80, you will have to configure Virtuoso to use this port from the virtuoso.ini file and then move your existing web server port to another number. Afterwards the procedure is similar:

DB.DBA.VHOST_DEFINE(lpath=>'/apache', ppath=>'http://example.com:90/');
Proxying Virtuoso via Apache

You may also achieve the same goal as above but in reverse, using another web server as a proxy in front of Virtuoso. If you have an existing Apache server that you want to keep as you default web server then you can set up a proxy within Apache to Virtuoso.

Firstly you will need to make sure that Apache can make use of the mod_proxy module available from most Apache distribution sites. You then have to make sure that it is referenced in your httpd.conf (or apache.conf) file. You should have something like:

...
LoadModule proxy_module       modules/libproxy.so
...
AddModule mod_proxy.c
...
Configuration steps

Below we will use the <Location> directive to simplify the configuration:

<Location /virtuoso/>
   ProxyPass               http://example.com:8890/
   ProxyPassReverse        /
</Location>
  1. Set the ProxyPass directive:

    The ProxyPass directive makes Apache to change all incoming URLs and map it to the internal http endpoint.

    So when the browser makes a request for:

    http://example.com/virtuoso/conductor/login.vsp
    

    it is rewritten to use:

    http://example.com:8890/conductor/login.vsp
    

    before sending the request over to the Virtuoso server.

  2. Set the ProxyPassReverse directive:

    The ProxyPassReverse directive rewrites the HTTP Headers that come back from Virtuoso to map back to the external URL. This is needed for e.g. 303 Location redirects where Virtuoso will use:

    Location: http://example.com:8890/conductor/pageXXX.vsp
    

    which Apache needs to rewrite to:

    Location: http://example.com/virtuoso/conductor/pageXXX.vsp
    

    before sending the reply back to the browser.

  3. If the mapping is / ---> / instead of /virtuoso/ ---> / then the settings should be done, since ProxyPass and ProxyPassReverse only deal with rewriting urls and http headers.

    When however there is a path mapping, there is a third step to take:

    Pages can contain clickable links like:

    <a href="/conductor/mypage.vsp">Click Here</a>
    

    If you click on this link in your browser, it would use:

    http://example.com/conductor/mypage.vsp
    

    which does not map back to your /virtuoso/ vpath in apache.

    As phpBB3 has been written from outset to cater for this situation, it will always need to recalculate fully qualified host/path names everywhere in its pages, which is not always very practical.

    Thus Apache needs to be configured to do page rewriting as well as in:

         ProxyHTMLEnable         On
         ProxyHTMLURLMap         / /virtuoso/
         ProxyHTMLURLMap         http://example.com:8890/ /virtuoso/
    

    This will rewrite the content of every page to make sure that links inside the page are rewritten to use the external mapping of this instance.

    If you have set Virtuoso to use EnabledGzipContent=1 , then you need to tell apache it may need to gunzip the content before doing this rewrite with the following line:

         SetOutputFilter         INFLATE;DEFLATE
    

    Although this takes a bit extra CPU power etc, it is still practical to setup a virtual path on user's own system that points to some external system.

    For example, add this to your httpd.conf to get a mapping to dbpedia-live instance:

    <Location /dbp/>
         ProxyPass               http://dbpedia-live.openlinksw.com/
         ProxyPassReverse        /
         ProxyHTMLURLMap         / /dbp/
         ProxyHTMLURLMap         http://dbpedia-live.openlinksw.com/ /dbp/
         SetOutputFilter         INFLATE;DEFLATE
    </Location>
    

    Now you should be able to use for ex.:

    http://example.com/dbp/page/London
    
Usage Example
NameVirtualHost 82.191.21.32

<VirtualHost 82.191.21.32>
ServerName www.mysite.net <http://www.mysite.net>
...

     #  Disable global proxy
     ProxyRequests       Off

     #  Pass original host to Virtuoso
     ProxyPreserveHost   On

     #  Timeout waiting for Virtuoso
     ProxyTimeout        300

     #  Set permission
     <Proxy *>
         Order deny,allow
         Allow from all
     </Proxy>

     #
     #  Map /virtuoso/ to a local Virtuoso instance.
     #
     #  Since ProxyPass and ProxyPassReverse only fix the Headers
     #  of the request, we need to use ProxyHTMLURLMap to rewrite
     #  content.
     #
     <Location /virtuoso/>
         ProxyPass               http://example.com:8890/
         ProxyPassReverse        /

         #  Enable rewrite rules
         ProxyHTMLEnable         On
         ProxyHTMLURLMap         / /virtuoso/
         ProxyHTMLURLMap         http://example.com:8890/ /virtuoso/

# Uncomment this when EnabledGzipContent=1 in virtuoso.ini
         #SetOutputFilter         INFLATE;DEFLATE
     </Location>
</VirtualHost>

If we map the virtual host straight through to Virtuoso, we only need header rewriting and save the time/cpu power to rewrite the content:

     #
     #  Map / to a local Virtuoso instance
     #
     #  Since paths are mapped straight through, we do not have to
     #  rewrite the content.
     #
     <Location />
         ProxyPass               http://example.com:8890/
         ProxyPassReverse        /
     </Location>