Joins are a common and well-understood operation in batch data processing to connect the rows of two relations. However, the semantics of joins on dynamic tables are much less obvious or even confusing.
Because of that, there are a couple of ways to actually perform a join using either Table API or SQL.
Regular joins are the most generic type of join in which any new records or changes to either side of the join input are visible and are affecting the whole join result. For example, if there is a new record on the left side, it will be joined with all of the previous and future records on the right side.
These semantics allow for any kind of updating (insert, update, delete) input tables.
However, this operation has an important implication: it requires to keep both sides of the join input in Flink’s state forever. Thus, the resource usage will grow indefinitely as well, if one or both input tables are continuously growing.
A time-windowed join is defined by a join predicate, that checks if the time attributes of the input records are within certain time constraints, i.e., a time window.
Compared to a regular join operation, this kind of join only supports append-only tables with time attributes. Since time attributes are quasi-monontic increasing, Flink can remove old values from its state without affecting the correctness of the result.
A join with a temporal table joins an append-only table (left input/probe side) with a temporal table (right input/build side), i.e., a table that changes over time and tracks its changes. Please check the corresponding page for more information about temporal tables.
The following example shows an append-only table
Orders that should be joined with the continuously changing currency rates table
Orders is an append-only table that represents payments for the given
amount and the given
For example at
10:15 there was an order for an amount of
RatesHistory represents an ever changing append-only table of currency exchange rates with respect to
Yen (which has a rate of
For example, the exchange rate for the period from
11:15 it was
Given that we would like to calculate the amount of all
Orders converted to a common currency (
For example, we would like to convert the following order using the appropriate conversion rate for the given
Without using the concept of temporal tables, one would need to write a query like:
With the help of a temporal table function
RatesHistory, we can express such a query in SQL as:
Each record from the probe side will be joined with the version of the build side table at the time of the correlated time attribute of the probe side record. In order to support updates (overwrites) of previous values on the build side table, the table must define a primary key.
In our example, each record from
Orders will be joined with the version of
Rates at time
currency field has been defined as the primary key of
Rates before and is used to connect both tables in our example. If the query were using a processing-time notion, a newly appended order would always be joined with the most recent version of
Rates when executing the operation.
In contrast to regular joins, this means that if there is a new record on the build side, it will not affect the previous results of the join. This again allows Flink to limit the number of elements that must be kept in the state.
Compared to time-windowed joins, temporal table joins do not define a time window within which bounds the records will be joined. Records from the probe side are always joined with the build side’s version at the time specified by the time attribute. Thus, records on the build side might be arbitrarily old. As time passes, the previous and no longer needed versions of the record (for the given primary key) will be removed from the state.
Such behaviour makes a temporal table join a good candidate to express stream enrichment in relational terms.
After defining temporal table function, we can start using it. Temporal table functions can be used in the same way as normal table functions would be used.
The following code snippet solves our motivating problem of converting currencies from the
Note: State retention defined in a query configuration is not yet implemented for temporal joins. This means that the required state to compute the query result might grow infinitely depending on the number of distinct primary keys for the history table.
With a processing-time time attribute, it is impossible to pass past time attributes as an argument to the temporal table function. By definition, it is always the current timestamp. Thus, invocations of a processing-time temporal table function will always return the latest known versions of the underlying table and any updates in the underlying history table will also immediately overwrite the current values.
Only the latest versions (with respect to the defined primary key) of the build side records are kept in the state. New updates will have no effect on the previously results emitted/processed records from the probe side.
One can think about a processing-time temporal join as a simple
HashMap<K, V> that stores all of the records from the build side.
When a new record from the build side has the same key as some previous record, the old value is just simply overwritten.
Every record from the probe side is always evaluated against the most recent/current state of the
With an event-time time attribute (i.e., a rowtime attribute), it is possible to pass past time attributes to the temporal table function. This allows for joining the two tables at a common point in time.
Compared to processing-time temporal joins, the temporal table does not only keep the latest version (with respect to the defined primary key) of the build side records in the state but stores all versions (identified by time) since the last watermark.
For example, an incoming row with an event-time timestamp of
12:30:00 that is appended to the probe side table
is joined with the version of the build side table at time
12:30:00 according to the concept of temporal tables.
Thus, the incoming row is only joined with rows that have a timestamp lower or equal to
applied updates according to the primary key until this point in time.
By definition of event time, watermarks allow the join operation to move forward in time and discard versions of the build table that are no longer necessary because no incoming row with lower or equal timestamp is expected.