Implementing a Tinkerpop3 OLTP Provider

Horacio Hoyos Rodriguez
9 min readAug 28, 2018

I will not dwell on the reasons that compel you to start such an endeavour, I will simply provide you with some guidance that hopefully will help you come out of it alive. If its merely curiosity what brought you here, then you might want to read about Tinkerpop providers first.

In this post I will only discuss my findings on the implementation of the structure API: Graph, Element, Vertex, Edge, Property and Transaction (if transactions are supported by the underlying persistence technology).

Implement the Structure API

Implementation of the Structure API is mostly straight forward. These are the major considerations:

  • Methods that return a collection of things, e.g. Graph.edges(Object... edgeIds), return an Iterator<?>. Depending on your underlying graph driver (i.e. jdbc like class that talks to the underlying graph implementation) you might need to provide a custom Iterator implementation that allows your results to be iterators.
  • VertexProperty is both a Property and an Element. The later is because vertex properties can have properties of their own. Personally, I would have picked another interface to represent this (i.e. a VertexProperty can not be contained directly in a graph -- its neither a vertex nor an edge). I found implementing both interfaces a bit challenging because it is easy to get confused between the VertexProperty's key:value and its value properties key:values.
  • The Graph implementation is expected to work as a factory by the TestSuite (more on that later). For this you must provide an static method with the following signature: GraphImpl.open(Configuration configuration), where configuration is an Apache Commons Configuration] (version 1.0, mind you). Personally, I found that providing a constructor that accepts a configuration kills two birds with a stone.
  • Tinkerpop TestCases expect the Graph implementation to allow "schema-less" graphs, i.e. graphs in which vertices and edges don't have a label. Although skipping the tests might seem tempting if your underlying graph technology is schema-full, its better to provide a default schema. Just make sure that you provide separate table/document/store for vertex and edges per graph. In other words, just make sure that any storage space is unique per graph (name).
  • If you are using maven and copied one of the existing pom files from a Tinkerpop Provider, either remove the gendoc plugin or make sure all your methods are completely documented. Otherwise your maven tasks will fail due to broken Javadocs.
  • The method public <V> Iterator<Property<V>> properties(String... propertyKeys) is actually a filter, so if there are no arguments it means return ALL the properties.

Graph, Vertex, Edge, and VertexProperty features

Your implementation must inform Tinkerpop/Gremlin of what features are supported by each element implementation. The Tinkerpop API interfaces provide default implementations that return true for all the features. All methods follow the same name conventions supportsXXXX, where XXXX is the feature name. The API is well and poorly documented. It is well documented in the sense that each supports method describes correctly what the feature enables/disables. However, there is a lack of linking to how this features affect or can be used in your implementation. Take for example the:

/**
* Determines if a {@link VertexProperty} allows an identifier to be
* assigned to it.
*/
@FeatureDescriptor(name = FEATURE_USER_SUPPLIED_IDS)
public default boolean supportsUserSuppliedIds() {
return true;
}

It is clear that this allows the Provider to control if the user can assign (via Gremlin) any ID to an element (Graph, Vertex, Edge) of the graph. However it does not tell if this method will be automatically invoked or if it is the Provider implementer’s responsibility to check this in the methods that create new elements. Following the references to this method, we find a reference in public default boolean willAllowId(final Object id) from the VertexPropertyFeatures interface, which is then referenced from the static method createVertex(final Attachable<Vertex> attachableVertex, final Graph hostGraph) which has no documentation at all. This becomes really frustrating after some time…

After more reading I found that the Class documentation informs me that this is only for elements that can be detached/re-attached to a graph. So, is supportsUserSuppliedIds really enforced, or its left to the discretion of the implementer? It all points out to the latter (correct me if I am wrong). In summary, the Tinkerpop API is full of useful static classes and methods that can help you provide a robust implementation and facilitate that “all the exceptions and their messages are consistent amongst all TinkerPop3 implementations”. However, sadly, it is not well documented on how to take advantage of all this framework code. You will need to take a lot of time to navigate through methods and references, and read all the API implementation to find out how, and where, they can be used.

Next is a snippet of how this helper classes/methods can be used:

@Override
public Vertex addVertex(Object… keyValues) {
ElementHelper.legalPropertyKeyValueArray(keyValues);
Object id;
VertexImpl vertex = null;
if (!ElementHelper.getLabelValue(keyValues).isPresent()) {
Graph.Exceptions.argumentCanNotBeNull(T.label.name());
}
String label = ElementHelper.getLabelValue(keyValues).get();
if (ElementHelper.getIdValue(keyValues).isPresent()) {
id = ElementHelper.getIdValue(keyValues).get();
if (this.features().vertex().willAllowId(id)) {
vertex = new VertexImpl(this, label, id.toString());
}
else {
Vertex.Exceptions.userSuppliedIdsOfThisTypeNotSupported();
}
}
else {
vertex = new VertexImpl(this, label);
}
ElementHelper.attachProperties(vertex, keyValues);
// Persist vertex in specific technology

return vertex;
}

We first use the ElementHelper to validate the keyValues and to check that a label has been provided. Then we test if an id key is present so we can get the value and check if we will allow it. Notice that we are also using the provided static Exceptions to signal errors.

VertexProperties

Vertex properties are expected to work like Apache Commons Configuration with regards to the Cardinality of the property’s value. Following are some, at least for me, quirks of this expectation.

  • A list or set Cardinality should be represented as a single Cardinality, if it only contains one value, that is, if a single value you return a single object, if many values you return the collection.
  • The Tinkerpop implementation provides some utility classes to allow all implementations to have the same “look and feel”. One of such utilities is the StringFactory class which provides methods to generate consistent toString() representation of elements. You are encouraged to use them:
public class MyVertexImpl extends implements Vertex {

@Override
public String toString() {
return StringFactory.vertexString(this);
}


}

Vertices are represented by their id, label and properties:

<id>[<label>](<property=value>, <property=value>, …)

Property values can either be single values or collections:

lastname = Solo; name = Han
lastname = Amidala, Skywalker; name = Leia

If the last paragraphs seem weird and unrelated, that is just to give you an idea of how I felt at the beginning trying to understand how to make things work.

Ultimately I found out that Tinkerpop [expects] falls into place nicely if properties [to be] behave like vertices. Hidden vertices for that matter. Hence, a Vertex has edges to its properties. Embracing this concept simplified my implementation and allowed me to reuse a lot of code. To keep property-edges hidden you could use the utilities in the Hidden class (nested inside Graph). Also, the ElementHelper.stageVertexProperty method is very handy when dealing with the single/multiple duality (more so if you went for the node-relation approach).

Testing using the Tinkerpop Test Suites

The implementation of this core API and its validation via the gremlin-test suite is all that is required

of a graph system provider wishing to provide a TinkerPop3-enabled graph engine.

Thus, it is very important to run the gremlin-test suite. Again, I found the lack of documentation the biggest barrier against doing it so. The Provider documentation states that the following three classes must be provided (where XXX is the name of the particular graph):

// Structure API tests
@RunWith(StructureStandardSuite.class)
@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
public class XXXStructureStandardTest {}

// Process API tests
@RunWith(ProcessComputerSuite.class)
@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
public class XXXProcessComputerTest {}

@RunWith(ProcessStandardSuite.class)
@GraphProviderClass(provider = XXXGraphProvider.class, graph = XXXGraph.class)
public class XXXProcessStandardTest {}

And that hence we only require to implement the `XXXGraphProvider.class`. This is partially true as indeed implementing the GraphProvider will make sure the gremlin-tests are executed against your graph (Provider) implementation. However, if you want to add your own set of tests to into the mix you will also need to provide your own XXXXStructureStandardSuite.class. Sadly, the StructureStandardSuite.class has the list of tests as a private field, so we can not easily reuse it.

What is the equals contract?

Equality is a big part of testing as usually some computed result is expected to be equal to some control/fixed value. In some tests (IoVertextTest, I am looking at you), two vertices from different graphs are expected
to be equal, as per the TestHelperClass:

public static void validateVertexEquality(final Vertex   originalVertex,final Vertex otherVertex, boolean testEdges) {
assertEquals(originalVertex, otherVertex);

Nowhere in the documentation I found information about the equals contract/expectations. In the mentioned test, one of the vertices is a StarVertex which does not override the equals method. So, sorry Tinkerpop, not gonna happen — unless you let us know that equality is based on properties+neighbours or something.

XXXGraphProvider

The most important method of the GraphProvider is the public Map<String, Object> getBaseConfiguration(String graphName, Class<?> test, String testMethodName, GraphData loadGraphWith), which oddly enough returns a Map and not a Apache Commons Configuration. Also, the documentation explains that graph uniqueness (by name) must be enforced and how this method can be used to provide fine tuned configurations based on the test name. However, there is no mention of the minimum set of values that the TestSuite expects. As a bare minimum, you must provide the following key:value pairs:

  • Graph.GRAPH : <XXXGraph implementation class name>.

Basically the test will use the class name to load your Graph class via reflection (this is where the static open method comes into play). The class name can be easily retrieved from XXXGraph.class.getName(). Apart form that, you must pass all the required values of your implementation. Yes, all the ones you use in your constructor method ;).

Another important key information missing is what are the differences between tests and which ones are more amenable to require distinct base configurations. Further, in my particular case, I need to know all the vertex and edge labels that will be used (i.e. the schema), prior to creating the graph since they need to be provided in the configuration. You could get this information from the `LoadGraphWith.GraphData loadGraphWith` paramter. But since all tests don’t do a complete graph exercise, then this parameter can be null for some cases. When null, then the required labels will depend on the test.

IMHO the biggest issue with the test suite is that suffers from heterogeneity, making it difficult to provide a concise XXXGraphProvider. Just take a look at the schema requirements bellow. It would be nice if the Tinkerpop team made an effort to be more consistent in this regard, e.g. by annotating all tests that add vertex/edges with an annotation similar to `LoadGraphWith` which provides the schema when needed (btw, why do all -the other- tests use a different schema if they all use the ‘standard’ graph).

I apologise, I was going to add the list of test-schema here but since the suite is so big this would be too long. You can take a look at my Provider implementation to know what I am talking about.

Running the TesteSuite

It is BIG, you are warned. The Tinkerpop developers where kind enough to allow us to specify what tests we want to run via an environment variable:

Set the {@code GREMLIN_TESTS} environment variable to a comma separated list of test classes to execute.

This setting can be helpful to restrict execution of tests to specific ones being focused on during development. Once more, I will save you the pain, the above should read “… comma separated list of the *qualified name of the* test classes …”

OptIn and OptOut

Another level of control is by opting in or out of given tests. This is done via annotations in your graph class, e.g.:

@Graph.OptOut(
test = “org.apache.tinkerpop.gremlin.structure.GraphTest”,
method = “shouldRemoveEdges”,
reason = “Test creates edges with random labels, which does not work with our schema-based approach.”)

However I find the level of control to high, particularly with parametrized tests. Perhaps you would want to run the test for a subset of the parameter values. Further, this annotation cannot be used for tests written in inner classes. So for some of the IO tests there is no way in or out.

Failed tests and Java Number types

The tests are very strict on the type of numeric properties. That is, if the tests sets a Vertex property to 0.5f it expects the returned value (of reading the property) to be a Float with value 0.5. So if your implementation/technology is a bit more relaxed and stores/retrieves this as a Double with value 0.5 the test will fail. Similarly 1.0 != 1 and 1L=1. I know, technically there are differences, but seriously, if you want to test Number types, then store values that exercise the precision/capacity of the type so the importance of the test is shown.

In Action

This findings where the result of implementing a Tinkerpop provider for ArangoDB.

Conclusion

The Tinkerpop API is well designed and robust. The implementation provides a plethora of tools to help you provide an implementation that is consistent with the API. Sadly, the lack of documentation makes finding this tools and using them challenging and time consuming. The lack of implementation documentation also makes it hard to understand the expectations that the API and the tests have on our code.

--

--