Background
I had been trying to install Jena TDB and Joseki on Ubuntu Server for 3 days and finally it succeeded. I was writing this article to share my experience with the World. I hope it will help ones who are doing similar things. In case that you don't know what they are, Jena is a Java library to parse and manipulate RDF data. Jena TDB is an RDF storage without using a database. Joseki is a SPARQL server that enables us to use SPARQL over http.
Thanks to Ric Roberts. His article on Jena and Joseki saved my life. The article you are going to read below is an extension to his Jena article.
Note: I setup my Ubuntu Server on VirtualBox but my instructions should work on a real server too.
Ingredients
Here below are the list of the ingredients I used (prepare them before you get started)
- Ubuntu Server 10.04 LTS 64bit (http://www.ubuntu.com/download/server/download)
- Java SE 6 Update 31 for Linux x64 (use JDK not JRE) (download jdk-6u31-linux-x64.bin from http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u31-download-1501634.html. Do not use one with -rpm.bin, it is for Red Hat Linux)
- Jena TDB 0.9.0 Incubating (download jena-tdb-0.9.0-incubating-distribution.tar.gz from https://repository.apache.org/content/repositories/releases/org/apache/jena/jena-tdb/0.9.0-incubating/)
- Joseki 3.4.4 (download joseki-3.4.4.zip from http://sourceforge.net/projects/joseki/files/OldFiles/)
Step 1: Set up your server
I assume that you can setup the Ubuntu Server by yourself. When you are setting up, it will ask you whether to install extra components such as LAMP server, Tomcat, OpenSSH server etc. You can choose to install nothing, especially, DO NOT install Tomcat otherwise you will automatically get Java openjdk with its installation. We don't need it.
At this step, I will assume that my user account is sysadmin with sysadmin folder as my home folder.
Now put all ingredients (jdk-6u31-linux-x64.bin, jena-tdb-0.9.0-incubating-distribution.tar.gz, joseki-3.4.4.zip) in your home folder.
Step 2: Install Java 6
$ chmod u+x jdk-6u31-linux-i586.bin
$ ./jdk-6u31-linux-i586.bin
[wait for files extracting, you will get the folder named "jdk1.6.0_31"]
$ sudo mv jdk1.6.0_31 /usr/lib/jvm/
$ sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.6.0_31/bin/java" 1
Then choose the java you installed as default
$ sudo update-alternatives --config java
Now test that you install it correctly
$ java -version
Step 3: Set up your environment variables
Extract jena-tdb-0.9.0-incubating-distribution.tar.gz and joseki-3.4.4.zip to your home folder, you will get "jena-tdb-0.9.0-incubating" and "Joseki-3.4.4" folders.
Modify your ~/.bashrc to include the following lines at the end of the file (this file is in your home folder):
export TDBROOT="/home/sysadmin/jena-tdb-0.9.0-incubating"
export JOSEKIROOT="/home/sysadmin/Joseki-3.4.4"
export JENAROOT="/home/sysadmin/jena-tdb-0.9.0-incubating"
export PATH="$TDBROOT/bin:$JOSEKIROOT/bin:$PATH"
export CLASSPATH=".:$JENAROOT/lib/*.jar:$JOSEKIROOT/lib/*.jar"
Note: sysadmin is my home folder.
Step 4: Make the script executable
$ cd /home/sysadmin/jena-tdb-0.9.0-incubating
$ chmod u+x bin/*
$ cd /home/sysadmin/Joseki-3.4.4
$ chmod u+x bin/*
At this point, you can test Joseki works, by running:
$ cd /home/sysadmin/Joseki-3.4.4
$ ./bin/rdfserver
… and browsing to http://127.0.0.1:2020. You can play with the built-in books dataset at http://127.0.0.1:2020/query.html.
(Kill Joseki with Ctrl-C in the terminal).
Step 5 – Configuring Joseki for your TDB store
In your Joseki directory, edit webapps/joseki/WEB-INF/web.xml and add a new servlet for your service.
<servlet-mapping>
<servlet-name>SPARQL service processor</servlet-name>
<url-pattern>/myservice</url-pattern>
</servlet-mapping>
Step 6 – Make an html file for your service
Copy the webapps/joseki/query.html file and rename it (e.g. myservice.html). Edit the forms to submit to the url you set in the xml file in the previous step e.g. edit all form action to
<form action="myservice">
And edit the default SPARQL queries to something sensible.
Step 7 – Edit the Joseki config turtle
Joseki’s config file is in the form of a turtle file, joseki-config.ttl in the root of the Joseki directory. Edit that file and add the following:
Near the top, after all the prefixes:
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
Add a new service under the others:
# Service 3 - SPARQL processor only handling a given dataset
<#service3>
rdf:type joseki:Service ;
rdfs:label "My New Service" ;
joseki:serviceRef "myservice" ; # web.xml must route this name to Joseki
# dataset part
joseki:dataset <#mydatasetname> ;
# Service part. # This processor will not allow either the protocol,
# nor the query, to specify the dataset.
joseki:processor joseki:ProcessorSPARQL_FixedDS ;
.
Under datasets:
# init tdb [] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
<#mydatasetname> rdf:type tdb:DatasetTDB ;
rdfs:label "My Data Set" ;
tdb:location "/home/sysadmin/mydatasetdata" ; # or wherever you want the data to be stored
.
<#graph> rdf:type tdb:GraphTDB ;
tdb:location "/home/sysadmin/tdbgraphdata" ;
.
Step 8 – Start Joseki
$ cd /home/sysadmin/Joseki-3.4.4
$ ./bin/rdfserver
Step 9 – Load data into your TDB
$ cd /home/sysadmin/jena-tdb-0.9.0-incubating
$ bin/tdbloader --loc=/home/sysadmin/mydatasetdata -v /full/path/to/rdf/or/ttl/files/
Step 10 – Run SPARQL queries
You can now run SPARQL queries against your new dataset through the form on http://127.0.0.1:2020/myservice.html
…or programatically by sending requests to http://localhost:2020/myservice (with the relevant http headers).
Step 11 (optional) – init.d
If you’re doing this on a linux server, the chances are you’ll want Joseki to start automatically. Here’s an simple example init.d file to get you started:
Create an empty file in your home folder
$ touch joseki
Then put the following context to the file
#!/bin/bash -e
### BEGIN INIT INFO
# Provides: joseki
# Required-Start: $remote_fs $syslog
# Required-Stop: $remote_fs $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Start daemon at boot time
# Description: Enable service provided by daemon.
### END INIT INFO
. /lib/lsb/init-functions
export TDBROOT="/path/to/TDB-0.8.7"
export JOSEKIROOT="/path/to/Joseki-3.4.2"
export JENAROOT="/path/to/Jena-2.6.3"
export PATH="$TDBROOT/bin:$JOSEKIROOT/bin:$PATH"
export CLASSPATH=".:$JENAROOT/lib/*.jar:$TDBROOT/lib/*.jar:$JOSEKIROOT/lib/*.jar"
SELF=$(cd $(dirname $0); pwd -P)/$(basename $0)
DAEMON="bin/rdfserver"
PIDFILE=/var/run/joseki.pid
LOGFILE=/var/log/joseki.log
sanity_checks() {
# check pid doesn't exist.
if [ -a $PIDFILE ]; then
/bin/echo "ERROR: PID file $PIDFILE already exists."
exit 1
fi
}
#
# main()
#
case "${1:-''}" in
'start')
sanity_checks
log_begin_msg "Starting Joseki rdf server..."
cd $JOSEKIROOT
$DAEMON &> $LOGFILE & echo $! > $PIDFILE
log_end_msg $?
;;
'stop')
log_begin_msg "Stopping Joseki rdf server..."
start-stop-daemon --stop --pidfile $PIDFILE
rm $PIDFILE
log_end_msg $?
;;
'restart')
$SELF stop
$SELF start
;;
*)
/bin/echo "Usage: $SELF start|stop|restart"
exit 1
;;
esac
Move it to /etc/init.d
$ sudo mv joseki /etc/init.d/joseki
Then make it executable
$ sudo chmod +x /etc/init.d/joseki
$ sudo update-rc.d joseki defaults
You can now start the Joseki server with
$ sudo /etc/init.d/joseki start
and stop the Joseki server with
$ sudo /etc/init.d/joseki stop
====
[Added 11 Sep 2013]
Loading RDF Data to TDB, go to home folder. Stop Joseki and remove the old data folder before you run the following command;
$ sudo tdbloader2 --loc [target folder name]/ [import file name]
This comment has been removed by the author.
ReplyDelete