Cool RDF Java API
The functionality that powers the subcommands of the cool command line tool is also available as a set
of Java libraries. They are are either described in the sections below, or the documentation is not
yet available.
cool-rdf-formatter
cool-rdf-formatter is a Java library for pretty printing RDF/Turtle documents in a configurable and reproducible way.
It takes as input a formatting style and an Apache Jena Model and produces as output a pretty-printed RDF/Turtle document.
Why?
Reproducible Formatting
Every RDF library comes with its own serializers, for example an Apache Jena Model can be written
in multiple ways, the easiest being
calling the write method on a model itself: model.write(System.out, "TURTLE"). However, due to the
nature of RDF, outgoing edges of a node in the graph have no order. When serializing a model, there
are multiple valid ways to do so. For example, the following two models are identical on RDF-level,
even though the order of the properties differs in the serialization:
|
|
Therefore, when a model is serialized, one of many different (valid) serializations could be the result. This is a problem when different versions of a model file are compared, for example when used as artifacts in a git repository. Additionally, serialized files are often formatted in one style hardcoded in the respective library. So while Apache Jena and for example libraptor2 both write valid RDF/Turtle, the files are formatted differently. You would not want the code of a project formatted differently in different files, would you?
cool-rdf-formatter addresses these problems by taking care of serialization order and providing a way to customize the formatting style.
Nice and Configurable Formatting
Most serializers, while creating valid RDF/Turtle, create ugly formatting. Obviously, what is ugly and what isn’t is highly subjective, so this should be configurable. cool-rdf-formatter addresses this by making the formatting style configurable, e.g. how alignment should be done, where extra spaces should be inserted and even if indendation is using tabs or spaces. A default style is provided that reflects sane settings (i.e., the author’s opinion). An RDF document formatted using the default style could look like this:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . (1)
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix : <http://example.com/relations#> .
:Male a owl:Class ; (2)
owl:disjointWith :Female ; (3)
owl:equivalentClass [ (4)
a owl:Restriction ;
owl:hasSelf true ; (5)
owl:onProperty :isMale ;
] ;
rdfs:subClassOf :Person .
:hasBrother a owl:ObjectProperty ;
owl:propertyChainAxiom ( :hasSibling :isMale ) ; (6)
rdfs:range :Male .
:hasUncle a owl:ObjectProperty, owl:IrreflexiveProperty ; (7)
owl:propertyChainAxiom ( :hasParent :hasSibling :hasHusband ) ; (7)
owl:propertyChainAxiom ( :hasParent :hasBrother ) ;
rdfs:range :Male .
| 1 | Prefixes are sorted by common, then custom. They are not aligned on the colon because that looks bad when one prefix string is much longer than the others. |
| 2 | rdf:type is always written as a. It is always the first predicate and written in the same
line as the subject. |
| 3 | Indentation is done using a fixed size, like in any other format or language. Predicates are not aligned to subjects with an arbitrary length. |
| 4 | Anonymous nodes are written using the [ ] notation whenever possible. |
| 5 | Literal shortcuts are used where possible (e.g. no "true"^^xsd:boolean). |
| 6 | RDF Lists are always written using the ( ) notation, no blank node IDs or
rdf:next/rdf:first seen here. |
| 7 | The same predicates on the same subjects are repeated rather than using the , notation,
because especially when the objects are longer (nested anonymous nodes), it is difficult to
understand. The exception to this rule is for different rdf:types. |
Usage in Maven
Add the following dependency to your Maven pom.xml:
<dependency>
<groupId>cool.rdf</groupId>
<artifactId>cool-rdf-formatter</artifactId>
<version>2.0.0</version>
</dependency>
Gradle/Groovy: implementation 'de.atextor:turtle-formatter:2.0.0'
Gradle/Kotlin: implementation("de.atextor:turtle-formatter:2.0.0")
Calling the formatter
import java.io.FileInputStream;
import cool.rdf.formatter.FormattingStyle;
import cool.rdf.formatter.TurtleFormatter;
import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;
// ...
// Determine formatting style
FormattingStyle style = FormattingStyle.DEFAULT;
TurtleFormatter formatter = new TurtleFormatter(style);
// Build or load a Jena Model.
// Use the style's base URI for loading the model.
Model model = ModelFactory.createDefaultModel();
model.read(new FileInputStream("data.ttl"), style.emptyRdfBase, "TURTLE");
// Either create a string...
String prettyPrintedModel = formatter.apply(model);
// ...or write directly to an OutputStream
formatter.accept(model, System.out);
Customizing the style
Instead of passing FormattingStyle.DEFAULT, you can create a custom FormattingStyle object.
FormattingStyle style = FormattingStyle.builder(). ... .build();
The following options can be set on the FormattingStyle builder:
| Option | Description | Default |
|---|---|---|
|
Set the URI that should be left out in formatting. If you don’t care about this, don’t change it and use the FormattingStyle’s emptyRdfBase field as the base URI when loading/creating the model that will be formatted, see Calling the formatter |
|
|
Boolean. Example:
|
|
|
Boolean. Example:
|
|
|
Boolean. Example:
|
|
|
One of |
|
|
A
NumberFormat
that describes how |
|
|
Enables formatting of |
|
|
One of |
|
|
|
|
|
|
|
|
Integer. When using |
|
|
Boolean. Determines whether there is a line break after the last line |
|
|
Boolean. Determines whether |
|
|
Boolean. If |
|
|
Boolean. Determines whether to use commas for identical predicates. Example:
|
|
|
A set of predicates that, when used multiple times, are separated by commas, even when
|
|
|
Analogous to |
Empty |
|
A list of namespace prefixes that defines the order of
|
|
|
A list of resources that determines the order in which subjects appear. For a subject |
|
|
A list of properties that determine the order in which predicates appear for a subject. First all
properties that are in the list are shown in that order, then everything else lexicographically
sorted. For example, when
|
|
|
A list of RDFNodes (i.e. resources or literals) that determine the order in which objects appear
for a predicate, when there are multiple statements with the same subject and the same predicate.
First all objects that are in the list are shown in that order, then everything else
lexicographically sorted. For example, when
|
|
|
A
There is no way to serialize this model in RDF/Turtle while using the inline blank node syntax |
|
|
|
Varied |
|
|
|
* Adapted from EditorConfig