How to integrate T-Cypher?
T-Cypher can be integrated to a commercial graph database that uses Cypher (Neo4j e.g.) by translating T-Cypher queries into Cypher queries. By setting the translation rules of the temporal query constructs, a T-Cypher query can be translated into a Cypher query, hence, executed with a commercial graph database. This method might not be the optimal solution to evaluate T-Cypher queries since available commercial graph databases do not natively support. Indeed, desgining a native temporal database needs more sophisticated design considerations such as special storage technique, indexes, query evaluation, planning, etc. However, many practitioners are already using an existing graph database and do not want to migrate their backend data engine. Hence, a translation layer that can be easily coupled to an existing graph engine is more applicable. Which motivated us to develop a translation layer on top of Neo4j that allows the evaluation of T-Cypher queries with Neo4j.
The Temporal Property Graph Model explained
Before describing the translation process, let’s give a definition of the temporal property graph model.
A temporal property graph model can be defined as a collection of nodes, relationships, property values having each a set of validity time intervals. Each time interval refers to the time when the graph entity was valid. In the figure below we provide a toy graph modelling the dynamic connections between the different components of a smart factory. Note that this temporal graph will be used thoughout this description.
This Figure illustrates a toy graph in which nodes n1, …, n12 model products, machines, sensors, self-driving vehicles (S.D. vehicles) and employees. Whereas, relationships r1, …, r14 represent the connections between nodes. All nodes, relationships and properties are assigned with a set of valid time intervals such as each validity interval defined by its starting and ending time instants represents the time during which a graph entity was valid. Note that, time intervals of nodes are omitted and considered to be equal to [t1, ∞).
For instance, a product passes through different manufacturing machines which is translated in the temporal graph by an isIn relationship. A machine is monitored by sensors which is translated by a connectsTo relationship. An employee maintains machines regularly. An S.D. vehicle transfers products between machines. A relationship isCloseTo is created between S.D. vehicles to indicate their proximity. Furthermore, the measurements collected by sensors are represented as pairs of a values and time validity intervals to indicate the time when the value was valid.
The figure below presents the pipeline of the translation process. This layer consists of translating the temporal property graph model to the non-temporal property graph model (offered by non-temporal graph engines) and a T-Cypher query into a Cypher query.
This temporal property graph model is the logical model that should be followed by practitioners when reasoning about temporal queries. This model is translated into the non-temporal graph model supported by non-temporal graph engines by representing time intervals using properties. For each occurrence of a node or an edge, a new node or edge instance is created with properties representing the starting and ending time instants of the occurrence tStart and tEnd.
For each temporal occurrence of a dynamic property of a node, a new property node is created that is connected to the actual node with a relationship typed asProperty. This node is attached with three properties: value, tStart and tEnd to set the value and validity interval of the corresponding dynamic property.
Let’s discuss the model translation provided in the above image. Node n2 was valid between time instants t1 and never deleted hence a single node instance is created in the translated model. Relationship r2 between the machine and the sensor was valid during two time intervals indicating that the same sensor was connected two times to the same machines. In the translated model, each occurrence is translated into a new relationship instance with properties tStart and tEnd to indicate the starting and edning time instants of each occurrence. Finally, the measurement recorded by the sensor can be seen as a dynamic property such that for each recorded value a time interval is assigned to indicate when this value was valid. In the translated model, two property nodes are created indicating each a value of the dynamic property measurement that was valid during a time interval. These property nodes are connected to the original node with relationships typed ‘hasProperty’. Note that, the property ID of machine indicating its unique identifier is not considered as dynamic since the value of this property is guarenteed to be static. This separation is needed for performance optimization. That is, when reading the node, all static properties are stored along and hence retrieved with the node instead of traversing more ‘hasProperty’ relationships to retrieve the full node (node with all the values of its properties). Now, we consider that the choice between a dynamic and static property should be known apriori and given by users when defining the schema of the underlying graphs.
Given a T-Cypher query string, the lexer tokenises the query and sends the tokens to the parser which generates an Abstract Syntax Tree (AST). The visitor traverses each node of the tree to generate a T-Cypher query object. Then, the query object is converted to a Cypher query using a Cypher query builder based on the query translation rules described bellow.
Time slice: A time slice in a T-Cypher query is translated into a conjunction of temporal predicates on the properties referring to the starting and ending time instants of the nodes, relationships and property nodes.
Temporal relationship patterns: Rigid temporal relationship patterns with a minimum depth (n) equals to the maximum depth (m), are translated into n intermediate relationship patterns of length 1 between the original source and target node patterns. Then, the temporal constraints applied to each type of temporal pattern are translated into filtering expressions that are added to the WHERE sub-clause. Whereas, queries with temporal relationship patterns of variable length between n and m are first converted to the union of sub-queries with a rigid pattern of length that falls between n and m. Each sub-query is, hence, converted based on the translation rules of a rigid pattern. The Figure bellow illustrates the T-Cypher query with a rigid temporal relationship pattern of variable length and its Cypher equivalent.
First of all, let’s understand the semantics of this query. Based on its syntax, this query returns all the sequential product transfers between the S.D. vehicles and machines. The ‘2..3’ indicates that two or three sequential transfers should be returned in a path. Finally, the query returns the sequential transfer path. To translate a sequential path, we must first include the temporal constraint implying that two consecutive relationships (incoming to and outgoing from the same node) should have occured sequentially. This, indeed, translates to the fact that an S.D. vehicle or machine cannot transfer a product before receiving it. Now, these temporal constraints are added in the WHERE sub-clause for each pair of consecutive relationships. For instance, ‘r1.tEnd < r2.tStart’ indicates that r1 ends before r2 starts. Since the orginial T-Cypher query includes a variable length relationship pattern, we generated two sub-queries representing each the translation of the original query with a relationship pattern of fixed length going from 2 to 3 such that the union of these sub-queries is finally returned. The RETURN clause returns the path as an array of relationships and nodes composing the path.
Temporal functions and operators: An expression with temporal operators or functions is translated into a Cypher-based expression that is semantically equivalent using the built-in comparison operators and functions. For example, expression a@T BEFORE b@T is translated to a valid Cypher expression a.tEnd < b.tStart. However, functions and operators that cannot be expressed with the syntax of Cypher are implemented as user-defined functions. Indeed, these functions can be added as plugins to the Neo4j server and used in Cypher queries as built-in functions. For example the temporal function intersection() that returns the intersection between a number of time intervals is translated into MyFunctions.intersection(). MyFunctions is the name of the plugin and intersection is the name of the function. Internally, this function returns the time interval with the maximum starting instant and the minimum ending time instant of input intervals.
More concrete examples of query translation applied on temporal graph datasets are provided in Examples.