In the following we will explain how replikativ works by building a small CDVCS containing tagged bookmarks as an example.
Metadata (without id binding) looks like:
{:commit-graph
{#uuid "214bd0cd-c737-4c7e-a0f5-778aca769cb7" []},
:heads #{#uuid "214bd0cd-c737-4c7e-a0f5-778aca769cb7"}}
We need to rebind the id-generating function and time-function to have fixed values here for testing. Otherwise ids are cryptographic UUID
s over their referenced values by default, so they cannot and may not conflict. UUID-5 is the only option if you want to conform to the standard, so don't rebind these functions. The UUIDs represent unchangable and global values. While time can be helpful to track in commits, it is not critical for synching metadata. We will zero it out for testing here. All function calls are in fact unit tests, you can run this documentation with midje-doc.
(defn zero-date-fn [] (java.util.Date. 0))
(defn test-env [f]
(binding [*id-fn* (let [counter (atom 0)]
(fn ([] (swap! counter inc))
([val] (swap! counter inc))))
*date-fn* zero-date-fn]
(f)))
First we need to create the CDVCS. The new-cdvcs function returns a map containing both the metadata and value of the new CDVCS.
(test-env
#(cdvcs/new-cdvcs "mail:author@host.org"))
=>
{:state
#replikativ.crdt.CDVCS{:commit-graph {1 []},
:heads #{1}
:version 1},
:prepared [],
:downstream
{:crdt :cdvcs
:op {:method :new-state
:commit-graph {1 []},
:heads #{1}
:version 1}},
:new-values
{1
{:transactions [],
:parents [],
:ts #inst "1970-01-01T00:00:00.000-00:00",
:author "mail:author@host.org"
:crdt :cdvcs
:version 1
:crdt-refs #{}}}}
First we have a look at the metadata structure:
{:commit-graph {1 []},
:heads #{1}}
:commit-graph
contains the whole dependency graph for revisions and is the core data we use to resolve conflicts. It points reverse from head to the root commit of the CDVCS, which is the only commit with an empty parent vector.:heads
tracks all heads in the graph orderIt is noteworthy that the metadata is a CRDT. Since it needs to be synched globally (in a key value store), it needs to converge to be eventual consistent. When it is, synching new versions of metadata from remote sources can happen gradually and consistently converging to the global state and values of the CDVCS.
For each key the CRDT update function for value and new-value is described:
(meta/downstream
{ ;; only new keys (commits) can be added => (merge new-value value)
:commit-graph {1 [] ;; keys: G-SET
2 [1]}, ;; parental values don't change
;; keys: G-SET
;; values: similar to OR-SET,
;; (heads) are merged with lca which is commutative and idempotent,
;; heads cannot become empty
:heads #{2}}
;; new metadata information:
{:commit-graph {1 []
2 [1]
3 [2]
1000 [1]},
:heads #{3}})
=> {:commit-graph {1 []
2 [1]
3 [2]
1000 [1]},
:heads #{3}}
The most sophisticated operation is merging heads through lca, which is necessary to resolve stale heads. This operation has currently square complexity on the number of heads.
The operation is commutative:
(meta/downstream
;; new metadata information:
{:commit-graph {1 []
2 [1]
3 [2]
1000 [1]},
:heads #{3}}
{:commit-graph {1 []
2 [1]},
:heads #{2}})
=> (meta/downstream
;; new metadata information:
{:commit-graph {1 []
2 [1]},
:heads #{2}}
{:commit-graph {1 []
2 [1]
3 [2]
1000 [1]},
:heads #{3}})
And idempotent:
(meta/downstream
{:commit-graph {1 []
2 []},
:heads #{2}}
{:commit-graph {1 []
2 []},
:heads #{2}})
=> {:commit-graph {1 []
2 []},
:heads #{2}}
Which we have shown for each field of the metadata map individually above.
{#uuid "04eb5b1b-4d10-5036-b235-fa173253089a"
{:transactions [['(fn add-links [old params] (merge-with set/union old params)) ;; actually uuids pointing to fn and params
{:economy #{"http://opensourceecology.org/"}}]],
:ts #inst "1970-01-01T00:00:00.000-00:00",
:author "mail:author@host.org",
:parents [2 3], ;; normally singular, with merge sequence of parent commits applied in ascending order.
:crdt-refs #{}
}}
The value consists of one or more transactions, each a pair of a parameter map (data) and a freely chosen data (code) to describe the transaction. The code needn't be freely evaled, but can be mapped to a limit set of application specific operations. That way it can be safely resolved via a hardcoded hash-map and will still be invariant to version changes in code. Read: You should use a literal code description instead of symbols where possible, even if this induces a small overhead.
Forking yields a copy (clone).
(test-env
#(cdvcs/fork {:commit-graph {1 []
3 [1]},
:heads #{3}}))
=>
{:state
#replikativ.crdt.CDVCS{:commit-graph {1 []
3 [1]},
:heads #{3}},
:prepared [],
:downstream
{:crdt :cdvcs
:op {:method :new-state
:commit-graph {1 []
3 [1]},
:heads #{3}
:version 1}}}
Pulling happens much the same.
(reset! meta/lca-cache {})
(test-env
#(cdvcs/pull {:state {:commit-graph {1 []},
:heads #{1}}
:prepared []}
{:commit-graph {1 []
3 [1]
4 [3]},
:heads #{4}}
4))
=>
{:downstream
{:crdt :cdvcs
:op {:commit-graph {4 [3], 3 [1], 1 []},
:method :pull
:heads #{4}
:version 1}},
:state
{:commit-graph {1 [], 3 [1], 4 [3]},
:heads #{4}},
:prepared []}
Commit to apply changes to a CDVCS.
(test-env
#(cdvcs/commit {:state {:commit-graph {10 []
30 [10]
40 [30]},
:heads #{40}}
:prepared [[{:economy
#{"http://opensourceecology.org/"}
:politics #{"http://www.economist.com/"}}
'(fn merge [old params] (merge-with set/union old params))]]}
"mail:author@host.org"))
=>
{:new-values
{3
{:transactions [[1 2]],
:ts #inst "1970-01-01T00:00:00.000-00:00",
:parents [40],
:crdt-refs #{}
:crdt :cdvcs
:version 1
:author "mail:author@host.org"},
2 '(fn merge [old params] (merge-with set/union old params)),
1
{:politics #{"http://www.economist.com/"},
:economy #{"http://opensourceecology.org/"}}},
:downstream
{:crdt :cdvcs
:op {:method :commit
:commit-graph {3 [40]},
:heads #{3}
:version 1}},
:state
{:commit-graph {3 [40], 10 [], 30 [10], 40 [30]},
:heads #{3}},
:prepared []}
You can check whether a merge is necessary (there are multiple heads):
(test-env
#(cdvcs/multiple-heads? {:commit-graph {10 []
30 [10]
40 [10]},
:heads #{40 30}}))
=> true
Merging is like pulling but resolving the commit-graph of the conflicting head commits with a new commit, which can apply further corrections atomically. You have to supply the remote-metadata and a vector of parents, which are applied to the CDVCS value in order before the merge commit.
(test-env
#(cdvcs/merge {:state {:commit-graph {10 []
30 [10]
40 [10]},
:heads #{40}}
:prepared []}
"mail:author@host.org"
{:commit-graph {10 []
20 [10]},
:heads #{20}}
[40 20]
[]))
=>
{:new-values
{1
{:transactions [],
:ts #inst "1970-01-01T00:00:00.000-00:00",
:parents [40 20],
:crdt-refs #{}
:crdt :cdvcs
:version 1
:author "mail:author@host.org"}},
:downstream {:crdt :cdvcs
:op {:method :merge
:commit-graph {1 [40 20]},
:heads #{1}
:version 1}},
:state
{:commit-graph {1 [40 20], 20 [10], 10 [], 30 [10], 40 [10]},
:heads #{1}},
:prepared []}
When there are pending commits, you need to resolve them first as well.
(test-env
#(try
(cdvcs/merge {:state {:commit-graph {10 []
30 [10]
40 [10]},
:heads #{40}}
:prepared [[{:economy #{"http://opensourceecology.org/"}
:politics #{"http://www.economist.com/"}}
'(fn merge-bookmarks [old params]
(merge-with set/union old params))]]}
"mail:author@host.org"
{:commit-graph {10 []
20 [10]},
:heads #{20}}
[40 20]
[])
(catch clojure.lang.ExceptionInfo e
(= (-> e ex-data :type) :transactions-pending-might-conflict))))
=> true
Have a look at the replication API, the [stage API](http://whilo.github.io/replikativ/stage.html) and the [pull hooks](http://whilo.github.io/replikativ/hooks.html) as well. Further documentation will be added, have a look at the test/replikativ/core_test.clj tests or the API docs for implementation details.