Installing Jena TDB and Joseki on Ubuntu Server?

Background


I had been trying to install Jena TDB and Joseki on Ubuntu Server for 3 days and finally it succeeded. I was writing this article to share my experience with the World. I hope it will help ones who are doing similar things. In case that you don't know what they are, Jena is a Java library to parse and manipulate RDF data. Jena TDB is an RDF storage without using a database. Joseki is a SPARQL server that enables us to use SPARQL over http.

Thanks to Ric Roberts. His article on Jena and Joseki saved my life. The article you are going to read below is an extension to his Jena article.

Note: I setup my Ubuntu Server on VirtualBox but my instructions should work on a real server too.

Ingredients

Here below are the list of the ingredients I used (prepare them before you get started)

Step 1: Set up your server


I assume that you can setup the Ubuntu Server by yourself. When you are setting up, it will ask you whether to install extra components such as LAMP server, Tomcat, OpenSSH server etc. You can choose to install nothing, especially, DO NOT install Tomcat otherwise you will automatically get Java openjdk with its installation. We don't need it.

At this step, I will assume that my user account is sysadmin with sysadmin folder as my home folder.

Now put all ingredients (jdk-6u31-linux-x64.bin, jena-tdb-0.9.0-incubating-distribution.tar.gz, joseki-3.4.4.zip) in your home folder.

Step 2: Install Java 6


$ chmod u+x jdk-6u31-linux-i586.bin
$ ./jdk-6u31-linux-i586.bin

[wait for files extracting, you will get the folder named "jdk1.6.0_31"]

$ sudo mv jdk1.6.0_31 /usr/lib/jvm/
$ sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.6.0_31/bin/java" 1

Then choose the java you installed as default

$ sudo update-alternatives --config java

Now test that you install it correctly

$ java -version

Step 3: Set up your environment variables


Extract jena-tdb-0.9.0-incubating-distribution.tar.gz and joseki-3.4.4.zip to your home folder, you will get "jena-tdb-0.9.0-incubating" and "Joseki-3.4.4" folders.

Modify your ~/.bashrc to include the following lines at the end of the file (this file is in your home folder):

export TDBROOT="/home/sysadmin/jena-tdb-0.9.0-incubating"
export JOSEKIROOT="/home/sysadmin/Joseki-3.4.4"
export JENAROOT="/home/sysadmin/jena-tdb-0.9.0-incubating"
export PATH="$TDBROOT/bin:$JOSEKIROOT/bin:$PATH"
export CLASSPATH=".:$JENAROOT/lib/*.jar:$JOSEKIROOT/lib/*.jar"

Note: sysadmin is my home folder.

Step 4: Make the script executable


$ cd /home/sysadmin/jena-tdb-0.9.0-incubating
$ chmod u+x bin/*

$ cd /home/sysadmin/Joseki-3.4.4
$ chmod u+x bin/*

At this point, you can test Joseki works, by running:

$ cd /home/sysadmin/Joseki-3.4.4
$ ./bin/rdfserver

… and browsing to http://127.0.0.1:2020. You can play with the built-in books dataset at http://127.0.0.1:2020/query.html.

(Kill Joseki with Ctrl-C in the terminal).

Step 5 – Configuring Joseki for your TDB store


In your Joseki directory, edit webapps/joseki/WEB-INF/web.xml and add a new servlet for your service.


<servlet-mapping>
   <servlet-name>SPARQL service processor</servlet-name>
   <url-pattern>/myservice</url-pattern>
</servlet-mapping>

Step 6 – Make an html file for your service


Copy the webapps/joseki/query.html file and rename it (e.g. myservice.html). Edit the forms to submit to the url you set in the xml file in the previous step e.g. edit all form action to

<form action="myservice">

And edit the default SPARQL queries to something sensible.

Step 7 – Edit the Joseki config turtle


Joseki’s config file is in the form of a turtle file, joseki-config.ttl in the root of the Joseki directory. Edit that file and add the following:

Near the top, after all the prefixes:
@prefix tdb:   <http://jena.hpl.hp.com/2008/tdb#> .

Add a new service under the others:

# Service 3 - SPARQL processor only handling a given dataset
<#service3>
     rdf:type            joseki:Service ;
     rdfs:label          "My New Service" ;
     joseki:serviceRef   "myservice" ;   # web.xml must route this name to Joseki
     
     # dataset part
     joseki:dataset      <#mydatasetname> ;
     
     # Service part.     # This processor will not allow either the protocol,
     # nor the query, to specify the dataset.
     joseki:processor    joseki:ProcessorSPARQL_FixedDS ;
     .

Under datasets:

# init tdb [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
  
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset . 
tdb:GraphTDB    rdfs:subClassOf  ja:Model .  

<#mydatasetname>  rdf:type tdb:DatasetTDB ;
     rdfs:label "My Data Set" ;
     tdb:location "/home/sysadmin/mydatasetdata" ;  # or wherever you want the data to be stored
     .
  
<#graph> rdf:type tdb:GraphTDB ;
     tdb:location "/home/sysadmin/tdbgraphdata" ;
     .

Step 8 – Start Joseki


$ cd /home/sysadmin/Joseki-3.4.4
$ ./bin/rdfserver

Step 9 – Load data into your TDB


$ cd /home/sysadmin/jena-tdb-0.9.0-incubating
$ bin/tdbloader --loc=/home/sysadmin/mydatasetdata -v /full/path/to/rdf/or/ttl/files/

Step 10 – Run SPARQL queries


You can now run SPARQL queries against your new dataset through the form on http://127.0.0.1:2020/myservice.html

…or programatically by sending requests to http://localhost:2020/myservice (with the relevant http headers).

Step 11 (optional) – init.d


If you’re doing this on a linux server, the chances are you’ll want Joseki to start automatically. Here’s an simple example init.d file to get you started:

Create an empty file in your home folder

$ touch joseki

Then put the following context to the file

#!/bin/bash -e

### BEGIN INIT INFO

# Provides:          joseki
# Required-Start:    $remote_fs $syslog
# Required-Stop:     $remote_fs $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Start daemon at boot time
# Description:       Enable service provided by daemon.

### END INIT INFO

. /lib/lsb/init-functions

export TDBROOT="/path/to/TDB-0.8.7"
export JOSEKIROOT="/path/to/Joseki-3.4.2"
export JENAROOT="/path/to/Jena-2.6.3"
export PATH="$TDBROOT/bin:$JOSEKIROOT/bin:$PATH"
export CLASSPATH=".:$JENAROOT/lib/*.jar:$TDBROOT/lib/*.jar:$JOSEKIROOT/lib/*.jar"

SELF=$(cd $(dirname $0); pwd -P)/$(basename $0)
DAEMON="bin/rdfserver"
PIDFILE=/var/run/joseki.pid
LOGFILE=/var/log/joseki.log

sanity_checks() {
     # check pid doesn't exist.
     if [ -a $PIDFILE ]; then
         /bin/echo "ERROR: PID file $PIDFILE already exists."
         exit 1
     fi
}

#
# main()
#

case "${1:-''}" in
    'start')
        sanity_checks
        log_begin_msg "Starting Joseki rdf server..."
        cd $JOSEKIROOT
        $DAEMON &> $LOGFILE & echo $! > $PIDFILE
        log_end_msg $?
        ;;

    'stop')
        log_begin_msg "Stopping Joseki rdf server..."
        start-stop-daemon --stop --pidfile $PIDFILE
        rm $PIDFILE
        log_end_msg $?
        ;;

    'restart')
        $SELF stop
        $SELF start
        ;;

      *)

      /bin/echo "Usage: $SELF start|stop|restart"
      exit 1
      ;;
esac

Move it to /etc/init.d

$ sudo mv joseki /etc/init.d/joseki

Then make it executable

$ sudo chmod +x /etc/init.d/joseki
$ sudo update-rc.d joseki defaults

You can now start the Joseki server with

$ sudo /etc/init.d/joseki start

and stop the Joseki server with

$ sudo /etc/init.d/joseki stop

====
[Added 11 Sep 2013]

Loading RDF Data to TDB, go to home folder. Stop Joseki and remove the old data folder before you  run the following command;

$ sudo tdbloader2 --loc [target folder name]/ [import file name]


1 comment:

Related Posts Plugin for WordPress, Blogger...