RDF datatypes.
[org.clojars.quoll/rudolf "0.2.6"]org.clojars.quoll/rudolf {:mvn/version "0.2.6"}RDF is about data portability, meaning that lots of applications need to represent the same data objects,
whether they are parsing, writing, storing, or querying data. While there are some common ways of
representing this data, each different program often ends up using its own internal objects, records,
interfaces, protocols, modules, etc.
A common form used in the Java landscape is Jena's RDFDatatypes. While relatively general purpose, these are focused on Jena's requirements, which are not always applicable to other applications. They are also part of Java, making them less useful when writing code that is cross-compatible for ClojureScript applications.
This package is simply a standard set of records that can be used across Clojure and ClojureScript applications.
There is a single protocol: Serializable, with the function serialize. This function converts the object
into a string form that is suitable for Turtle,
NTriples or SPARQL.
RDF describes a graph that uses nodes of 3 types:
These types are all described below, along with the semantics of using them.
IRIs are used as labeled nodes in an RDF graph.
IRIs are a supertype of URI, which allows for internationalized characters in the domain. Most programming languages support a native URI type, with no constraints on the characters permitted, making them de facto IRI implementations. Many people (especially in the English-speaking world) will use the term URI rather than IRI.
While IRIs usually describe a location online, this is actually the subtype of identifier called the URL. The IRI, or URI, is a generalized identifier, and need not necessarily describe a location on the internet. This applies, even if it appears as a valid URL, though it is considered a Best Practice to have URIs that refer to locations where information about a resource can be retrieved.
Other valid URI forms also exist, including the URN,
which contains a numerical identifier, and the URI Reference, which may appear without a scheme or host.
In fact a URI Reference allows for almost any string to be a valid URI, provided it does not contain illegal
characters. This makes URIs more flexible than most applications expect, and makes validation in the general
case impossible. This means that the best representation of a URI is typically a standard string. URI
references are relative and not absolute, and only make sense when there is some context that can allow
them to be resolved to an absolute URI. This context will be provided by a base. For instance, /path
is a relative URI reference, and if it were in a context where the base is set to http://localhost, then
the relative URI can be resolved to the absolute URI of http://localhost/path.
IRIs can also be represented in many syntaxes as a Qualified Name (QName) or CURIE (Compact URI) (these are similar, though QNames are a subtype of CURIE, and are constrained to characters that can represent XML names). In this form, they appear as a prefix and local name. Similarly to relative URIs, this form is only valid with some context that allows the complete, absolute URI to be determined. In this case, the context must contain a mapping of a prefix to a namespace, which is an absolute URI. If the prefix on a CURIE is not in the current context, then the CURIE is not valid.
As an example, the CURIE rdf:type is only valid when the context maps the prefix rdf to a namespace.
This is almost always going to be the standard RDF namespace of: http://www.w3.org/1999/02/22-rdf-syntax-ns#.
The absolute URI is this namespace followed by the local name, which in this case is:
http://www.w3.org/1999/02/22-rdf-syntax-ns#type.
Clojure's keywords also have a form of :namespace/local which is very similar to a CURIE representation of
and IRI. Like many Clojure implementations, Rudolf takes advantage of this, and will often represent a CURIE
using a keyword, such as :rdf/type. Note that keywords do not have any context, and they should only be
used when the context is clear.
Rudolf contains a set of common prefixes, as a map from the keyword of the common prefix to a string. This is
stored as quoll.rdf/common-prefixes and has the keys:
:rdf:rdfs:xsd:owl:skos:dcterms
Context maps will always have this form of keyword_→_iri-string. RDF documents may sometimes refer to a default namespace with an empty prefix. This can be provided via a key that is one of the following:
nil""(the empty string)(keyword "")(an empty keyword):_default
Choose whichever works best for your application.
IRIs must always contain their complete, absolute form as a string. They do not contain a reference to
any local context, but a context can be used when constructing one, and they may contain their prefix/local
abbreviation.
When an IRI can be represented as a CURIE (via prefix/local), then it will be serialized this way.
For instance: rdf:type
When a prefix/local is not available, then it will be serialized in the absolute form, which is enclosed
with "angle brackets. For instance: <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
Internally, an IRI contains:
- The absolute form of the IRI as a string
- An optional prefix for CURIE form of the IRI, as a string.
- An optional local name for the CURIE for of the IRI, as a string. This does not require a prefix (in which case the CURIE uses the default prefix), but it must be present if the prefix is present.
Creating an IRI is done via the iri or curie functions. The absolute IRI must be provided in each case:
quoll.rdf/iri: There are 3 forms of this(iri string): This takes the string form of the absolute IRI and returns a simple object with no prefix or local parts.(iri string keyword): This takes the string form of the absolute IRI, and a keyword. The keyword must match the IRI, meaning that the local name must match the end of IRI string.(iri string string string): This takes the string form of the absolute IRI, the string form of the prefix, and the string form of the local name. The local name must match the end of the IRI string.
quoll.rdf/curie: This creates an IRI from a prefix/name in a context. The context is always expected to be a map of keywords to IRI strings.(curie keyword): This uses a keyword as an ersatz CURIE, using thecommon-prefixesto resolve the namespace.(curie context keyword): This uses a keyword as an ersatz CURIE, using the provided context to resolve the namespace.(curie context prefix local): This creates an IRI using a CURIE prefix and local name, using the provided context to resolve the namespace.
Using serialize to write an IRI in its string form will a CURIE or absolute representation, depending on its elements.
To get the raw IRI form, use the as-str function.
IRIs with the same absolute form will always compare as equal, even if the prefix/local values differ.
Literals are of 2 types: Language-tagged Literals, and Typed Literals. In both cases, they hold a pair of values.
A Language-tagged Literal is a string, with a language tag
(represented by a string). These have an implicit datatype of rdf:langString, and the language tag must
be present, and must conform to the specification in section 2.2.9 of RFC 5646.
A Typed Literal has a string representation of a value, along with the URI for the datatype of the value.
If no datatype is provided with Literal, then the assumption is that the literal a string, with the
datatype xsd:string. RDF systems are expected to support datatypes from XML Schema,
but other datatypes are allowed.
While some systems have a single type of Literal form (where the language tag is empty unless the datatype is rdf:langString)
Rudolf separates them. This is because they behave quite differently to each other, both in serialization and
in their requirements.
Language-tagged literals hold a string value, and a language-tag as a string. These are serialized
as the string between quotes, followed by an @ symbol, and the language tag. The language tag may
only contain letters, numbers, and dashes. For instance, here are 3 separate language-tagged literals:
"The color is red"@en
"The colour is red"@en-uk
"La couleur est rouge"@fr
Typed literals have a string representing the lexical form of the literal, and a URI for the datatype. Many datatypes have various lexical representations of the same value, and the lexical form need not be canonicalized.
Typed literals are serialized as the lexical form in quotes, followed by the characters ^^, and then
the IRI of the datatype. Note that the datatype IRI is serialized according to standard IRI serialization,
so it may be absolute or a CURIE. If the datatype is xsd:string then the type IRI may be omitted.
For instance, assuming the prefix xsd is defined appropriately in the context, the following
3 IRIs are equivalent:
"Hello world"
"Hello world"^^xsd:string
"Hello world"^^<http://www.w3.org/2001/XMLSchema#string>
Some datatypes allow various lexical forms for a single value. These should be considered identical
literals, even when the lexical form differs. This can be tedious to implement, even if staying
with the more common XSD types. To avoid this work, several of the known types are kept in their
canonical form (xsd:integer, xsd:float, xsd:dateTime, xsd:boolean).
The numerical types and boolean values have a special form of serialization, in that they are serialized in their standard lexical form without any syntactic markers. Consequently, The following pairs are both valid serializations of each other:
| Typed Literal | Standard |
|---|---|
| "5"^^xsd:integer | 5 |
| "5.1"^^xsd:float | 5.1 |
| "false"^^xsd:boolean | false |
Language-tagged Literals are constructed using quoll.rdf/lang-literal:
(lang-literal string string): This contains the lexical string of the literal, and the string with the language code.
Typed Literals are constructed using quoll.rdf/typed-literal:
(typed-literal value): This takes a Clojure value, and will convert it into a literal with the appropriate datatype. If the datatype is not recognized, then an exception will be thrown.(typed-literal value iri): This takes a value or its lexical form as a string, and an IRI representing the datatype. The IRI must be a string, a URI object, a Rudolf IRI, or a keyword. If a keyword is used, then it must be one of the known XSD datatypes.
The supported Clojure types and their known XSD datatypes are:
| Clojure Type | XSD Type | ClojureScript |
|---|---|---|
String |
xsd:string |
✅ |
long |
xsd:integer |
✅ |
double |
xsd:float |
✅ |
Date |
xsd:dateTime |
✅ |
Instant |
xsd:dateTime |
|
Keyword |
xsd:QName |
✅ |
Boolean |
xsd:boolean |
✅ |
URI |
xsd:anyURI |
✅ |
URL |
xsd:anyURI |
Note that ClojureScript does not have all of the types that Clojure supports, and all ClojureScript numbers are doubles.
However, any numerical value will be recognized as an integer if the function clojure.core/int? returns true.
Despite not being part of the general Clojure ecosystem, Rudolf also recognizes the Java types of
int, short, and byte values as xsd:integer, as well as their boxed equivalents. Similarly,
float is recognized as xsd:float.
For simple data types (strings and numbers), wrapping values in a call to typed-literal will be all that's needed.
If a different datatype is ever needed, pass it in explicitly.
Language-tagged literals will always be written in "lexical"@lang form.
Language-tagged literals are a record containing 2 fields:
:valuealways a string:langalways a string
Typed literals will always write themselves in the "lexical"^^type form, unless the type is xsd:string, in
which case the literal is written in "lexical" form. Systems that want to write these literals differently
should choose their own mechanism.
Typed literals are a record containing 2 fields:
:valuealways a string:datatypealways an IRI
Blank Nodes are graph elements with a label and have no inherent semantic meaning. They typically appear in a document using a label, but this label is only used to identify when the same node needs to be used again in the document. A blank node's label has no meaning outside of a document. This is important to consider, since many documents may use the same labels for their blank nodes, but if those documents are imported to the same place, the nodes will be distinct from each other.
Because of this, the user may want to manage the internal identifiers for blank nodes themselves.
Blank nodes often appear with a label that looks like a CURIE with an underscore as a namespace,
e.g. _:b0. In some contexts, they may appear as a structure without any such label, either
explicitly, as in the [] syntax of Turtle, or implicitly, as when they are created in an rdf:List.
When blank nodes are parsed, they may come with a label. The object created to represent that blank node should be recorded so that future reference to the same label in that context will refer to the same node. Note that parsing 2 different documents will mean 2 separate contexts.
Blank nodes can be constructed using quoll.rdf/blank-node.
(blank-node): Creates a new unique blank node.(blank-node string): Creates a blank node using the provided label. If the label has been seen before, then the original blank node that was created for that label is returned.
Many systems need to manage their own blank nodes, particularly if there are numerous contexts.
quoll.rdf/unsafe-blank-node was created for this purpose:
(unsafe-blank-node label): Creates a new blank node, storing the label internally.
Blank nodes use an internal counter to generate labels of the form: _:b0, _:b1, _:b2...
Provided labels are not used in the blank node record, but are instead used to lookup the cache for them.
Using unsafe-blank-node avoids the cache, and instead saves the label as the blank node's ID.
If an underscore/colon prefix is provided as the label, then this will be removed internally.
Blank nodes are a record with a single field:
:id
When serializing, blank nodes are always represented as _:id. This can be seen at a REPL:
=> [(blank-node "the") (blank-node "quick") (blank-node "brown") (blank-node "fox")]
[_:b0 _:b1 _:b2 _:b3]
=> [(blank-node "the") (blank-node "red") (blank-node "fox") (unsafe-blank-node "end")]
[_:b0 _:b4 _:b3 _:end](:require '[quoll.rdf as rdf :refer [iri curie lang-literal typed-literal]])
(iri "http://test.com/")
(iri "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" "rdf" "type")
(curie rdf/common-prefixes :rdf/type)
(lang-literal "test" "en")
(typed-literal "literal")
;; The following are equivalent
(typed-literal 5)
(typed-literal "5" (iri "http://www.w3.org/2001/XMLSchema#long" "xsd" "integer"))
(typed-literal "5" (curie rdf/common-prefixes :xsd/integer))All of the above types also implement the Node protocol. This is a simple protocol
with only a single function: get-type
This get-type function returns a keyword, which may be better for determining type
than calling instance? or reflection. All of the expressions below return true:
(:require '[quoll.rdf as rdf :refer [iri curie lang-literal typed-literal]])
(= (get-type (iri "http://test.com/")) :iri)
(= (get-type (lang-literal "test" "en")) :lang-literal)
(= (get-type (typed-literal "literal")) :typed-literal)
(= (get-type (blank-node)) :blank)- Extend to Jena's interfaces.
Copyright © 2023-2025 Paula Gearon
Distributed under the Eclipse Public License version 2.0.