Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLA+ Trace validation #113

Merged
merged 1 commit into from
Apr 18, 2024
Merged

TLA+ Trace validation #113

merged 1 commit into from
Apr 18, 2024

Conversation

joshuazh-x
Copy link
Contributor

Per our understanding, to guarantee the correctness of a service based on this raft library, we need to at least ensure two things: the correctness of the consensus algorithm, and correctness of implementation.

We know that using TLC model-check tool and we can verify the correctness of algorithm by exploring the state space of the finite state machine model. You can refer to #112 for the spec that aligns with existing raft library implementation. Note that the spec is also included in this PR.

This PR proposes a method to address the second part of the question: how to verify implementation aligns with algorithm. This PR leverage the idea from TLA+ trace validation in Microsoft CCF. The basic idea is to constrain the state space by the states and transitions that the service actually walks through. We add special trace log points in the library to record the algorithm relevant states and transitions. A trace validation spec walks the core algorithm state machine following the transitions specified by these traces. If it encounters any trace whose state or transition are not allowed by the core algorithm state machine, a mismatch between the implementation and algorithm is found.
Such mismatch implies two things: implementation issue, or out-of-dated model. If we can guarantee the correctness of model, then there must be something wrong in the implementation. We believe this would be especially useful to avoid regressions in new feature implementation.

@xiang90
Copy link
Contributor

xiang90 commented Jan 22, 2024

@joshuazh-x

Out of curiosity, have you found any interesting bugs in the etcd/raft with this test?

rafttest/network.go Outdated Show resolved Hide resolved
@joshuazh-x
Copy link
Contributor Author

@joshuazh-x

Out of curiosity, have you found any interesting bugs in the etcd/raft with this test?

No. I did not find bug so far.
Usually when we start a new project, the model shall be validated first. Then trace validation would be used to validate implementation to ensure it aligns to validated model. CCF follows such process.
For etcd, it is a little bit different as the implementation has been running good for long time. So we did it in a reversed way. We build the model according to etcd's implementation and use TLC model checker to validate the correctness of the model (and we did not find any violation to Raft properties). With this done, we can assure the correctness of current implementation and can catch future implementation issue if it may break Raft.

@joshuazh-x joshuazh-x force-pushed the trace-validation branch 2 times, most recently from 7f95a88 to 1e9254f Compare January 29, 2024 08:00
@joshuazh-x
Copy link
Contributor Author

Rebase to etcd-io/raft main

serathius
serathius previously approved these changes Jan 29, 2024
@serathius
Copy link
Member

cc @ahrtr @pav-kv

state_trace.go Outdated Show resolved Hide resolved
state_trace.go Show resolved Hide resolved
@pav-kv
Copy link
Contributor

pav-kv commented Feb 6, 2024

@joshuazh-x I'll give it a closer look this week.

@joshuazh-x joshuazh-x force-pushed the trace-validation branch 2 times, most recently from 614c0dd to 166e002 Compare February 19, 2024 03:45
@joshuazh-x
Copy link
Contributor Author

@joshuazh-x I'll give it a closer look this week.

@pav-kv any comment?

@serathius
Copy link
Member

ping @pav-kv

@serathius
Copy link
Member

From discussion on slack https://kubernetes.slack.com/archives/C3HD8ARJ5/p1711367871600289

@ahrtr asked if we can confirm that TLA+ can reproduce the etcd durability issue in etcd-io/etcd#14370. @joshuazh-x amazingly already provided a repro joshuazh-x@6aefcc8.

I think this already confirms how having at TLA+ model could provide benefits to etcd. We could continue to iterate on the PR, but I don't think it's needed as it already will bring value and should not impact the raft codebase. There are many improvements that we could implement:

  • Validate just raft model, like in joshuazh-x@6aefcc8
  • Validate different raft configuration (1 node, 3 nodes, 5 nodes)
  • Integrate with CI

However I don't think we need everything at once, I can work with @joshuazh-x to iterate on it.

tla/README.md Outdated Show resolved Hide resolved
@serathius
Copy link
Member

One thing, I would like to ensure that I can follow the instructions provided in README and run the validation using script.
It's important to have other people than @joshuazh-x knowing how to run them.

Left a comment.

@serathius serathius dismissed their stale review March 26, 2024 13:33

Cannot run ./validate.sh, please provide instruction on how to get a trace.

node.go Outdated Show resolved Hide resolved
node.go Outdated Show resolved Hide resolved
node.go Outdated Show resolved Hide resolved
@@ -761,6 +771,8 @@ func (r *raft) appliedSnap(snap *pb.Snapshot) {
// index changed (in which case the caller should call r.bcastAppend). This can
// only be called in StateLeader.
func (r *raft) maybeCommit() bool {
defer traceCommit(r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it's maybe Commit. Which means that it may not commit. But you always trace the commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit trace here will make state machine step with AdvanceCommitIndex action which contains same logic as that in maybeCommit. We expect state machine has states as that after calling maybeCommit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit trace here will make state machine step with AdvanceCommitIndex action which contains same logic as that in maybeCommit. We expect state machine has states as that after calling maybeCommit.

  • I do not see the "same logic".
  • Do you mean that you always need to record a rsmCommit event in the trace even if raft doesn't really commit although it called (*raft) maybeCommit()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rsmCommit tells trace validator to try advancing commit index as that in action AdvanceCommitIndex in the spec. As you can see, newCommitIndex will has same value as current commitIndex if no uncommitted entries have quorum acks in current term. This is of same behavior as maybeCommit.

AdvanceCommitIndex(i) ==
    /\ state[i] = Leader
    /\ LET \* The set of servers that agree up through index.
           Agree(index) == {k \in GetConfig(i) : matchIndex[i][k] >= index}
           logSize == Len(log[i])
           \* logSize == MaxLogLength
           \* The maximum indexes for which a quorum agrees
           agreeIndexes == {index \in 1..logSize :
                                Agree(index) \in Quorum(GetConfig(i))}
           \* New value for commitIndex'[i]
           newCommitIndex ==
              IF /\ agreeIndexes /= {}
                 /\ log[i][Max(agreeIndexes)].term = currentTerm[i]
              THEN
                  Max(agreeIndexes)
              ELSE
                  commitIndex[i]
       IN
        /\ CommitTo(i, newCommitIndex)
    /\ UNCHANGED <<messageVars, serverVars, candidateVars, leaderVars, log, configVars, durableState>>

raft.go Outdated Show resolved Hide resolved
raft.go Outdated Show resolved Hide resolved
state_trace_nop.go Outdated Show resolved Hide resolved
@ahrtr
Copy link
Member

ahrtr commented Mar 26, 2024

The example trace isn't correct?

$ ./validate.sh -s ./Traceetcdraft.tla -c ./Traceetcdraft.cfg ./example.ndjson 
spec: ./Traceetcdraft.tla
config: ./Traceetcdraft.cfg
Downloading TLA+ tools ... done.
./example.ndjson - FAIL
0 of 1 trace(s) passed

@zeu5
Copy link

zeu5 commented Mar 27, 2024

Quoting from the issue comment @pav-kv - #111 (comment)

Do I understand correctly that the actual code change should be minimal? My understanding is that it boils down to a dozen or so "tracing" calls in key places, and the whole thing is active only in tests. From this point of view, this work can be done with low risk (does this answer the "controlled" part?).

I believe that the code changes can be completely decoupled from the raft codebase. @joshuazh-x Did you explore other ways of obtaining the trace without changing the code. We explored a very similar approach where we sample traces and simulate the TLA model using the traces. We go further to use the coverage on the model to fuzz test the raft library. You can find our implementation here - https://github.com/zeu5/raft-fuzzing/blob/master/raft.go

There has been some contrasting work done where the traces are generated from the model and run on the implementation. This requires a bit more involved instrumentation which controls the running of the go code in a step-by-step fashion - https://dl.acm.org/doi/abs/10.1145/3552326.3587442. More crucially, they found certain issues with the original TLA spec used here. Unfortunately I observe that the same bugs exist in the TLA+ model here.

@ahrtr
Copy link
Member

ahrtr commented Mar 27, 2024

Did you explore other ways of obtaining the trace without changing the code.

I see your comment in the sig-etcd channel. I think the benefit of instrumenting/changing raft code is that it can be more easily reused by all applications & test systems.

@zeu5
Copy link

zeu5 commented Mar 27, 2024

To summarise my discussion in slack with @ahrtr - The instrumentation (light and safe) will be a feature of the raft library which users can use to collect traces and run the model checker. It is also useful in the etcd e2e tests. However, there is one main concern with the mechanism in which the traces should be collected.

If traces are collected per node and used to check the correctness of the states observed in that node, the interleaving of messages is lost. With this we risk not identifying large class of bugs where there is an inconsistency between the states of two or more nodes. For example, no two leaders in the same term.

Therefore any trace collection mechanism should be central (to all nodes) and preserve the exact interleaving of events observed. The example trace suggests that there is an interleaving of events, whether the interleaving is exactly as it occurred in the execution is unknown.

@joshuazh-x joshuazh-x force-pushed the trace-validation branch 3 times, most recently from d31be33 to 64ced68 Compare April 6, 2024 13:47
@joshuazh-x
Copy link
Contributor Author

Hey, just making sure we are proceeding with the PR.

Awesome to see such detailed discussion. Please note that it might be better to keep the discussion in the PR related to blockers/feedback related to the PR itself to ensure we are getting closer to merging it. If you have a high level design proposals for what we can do with TLA+ that don't block this PR I would recommend to move them to #111 or create a new issue.

@joshuazh-x will you able to address the @ahrtr suggestions in reasonable time? I don't want to push too much work on a single person, as we already see the merits of this approach. It's ok if you cannot address all of the issues in a single PR, we can delegate as it seems there are more people now interested in helping.

@serathius @ahrtr Sorry I was swamped at something else lately. I updated the PR following my last catch-up with @ahrtr. Hopefully I did not miss anything important.

@joshuazh-x
Copy link
Contributor Author

Just had a zoom call with @joshuazh-x .

We should make sure the Readme is clear & accurate.

We expect the Readme should be super clear & accurate, so that everyone just needs to follow the readme to run whatever verification. Please write the readme from users' perspective.

Where to generate trace?

Currently this PR instruments/changes raft code to generate trace logs.

There is comment above to propose to obtain the trace without changing raft code. We agree that such approach has better maintainability, and it completely verifies raft as a black box.

But the current way (instrumenting raft code directly) can be more easily reused & integrated into application's (e.g. etcd) test suites. Applications (e.g etcd) just need to provide a valid TraceLogger instance to raft. Otherwise, each application may need to create a different utility to obtain the trace.

It's still open to discussion as a followup.

Merge all trace logs

I was thinking each node just records & verifies the trace logs separately. it isn't correct. All trace logs generated by each node should be merged into one trace file.

Usually in github workflow, all etcd nodes are running on the same host, so we may need to point all TraceLogger instances to the same file. It may complicate the test.

Potential problem when node crashes & restarts

In etcd's test cases, we may randomly crash & restart any etcd member using failpoints. When an etcd node crashes, it may partially persist some logs (WAL records). But the trace log are generated inside the raft repo. So we may run into a situation that all trace logs are persisted, but the data (WAL records) are partially persisted, eventually it leads to the trace validation failure.

It needs more discussion on this.

@ahrtr @lemmy has some great idea on solving this issue. Basically we will need to validate this in non-deterministic way. I'll submit another PR after this one is in.

@joshuazh-x
Copy link
Contributor Author

@serathius @ahrtr #192 is the draft PR I mentioned to generate traces for test.

Copy link

@MadhavJivrajani MadhavJivrajani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really useful, thanks for your work!

I had a quick question to make sure I'm doing this right:
I tried running verify-model.sh and I get:

Starting... (2024-04-07 14:49:41)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2024-04-07 14:49:41.
Model checking completed. No error has been found.
  Estimates of the probability that TLC did not check all reachable states
  because two distinct states had the same fingerprint:
  calculated (optimistic):  val = 3.3E-19
7 states generated, 1 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 1.
The average outdegree of the complete state graph is 0 (minimum is 0, the maximum 0 and the 95th percentile is 0).

I see only one state and a graph depth of 1, is that right? Or could that be because of not running it long enough? And also, how can I run this for longer?

tla/validate-model.sh Outdated Show resolved Hide resolved
tla/validate.sh Outdated Show resolved Hide resolved
@joshuazh-x
Copy link
Contributor Author

This is really useful, thanks for your work!

I had a quick question to make sure I'm doing this right: I tried running verify-model.sh and I get:

Starting... (2024-04-07 14:49:41)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2024-04-07 14:49:41.
Model checking completed. No error has been found.
  Estimates of the probability that TLC did not check all reachable states
  because two distinct states had the same fingerprint:
  calculated (optimistic):  val = 3.3E-19
7 states generated, 1 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 1.
The average outdegree of the complete state graph is 0 (minimum is 0, the maximum 0 and the 95th percentile is 0).

I see only one state and a graph depth of 1, is that right? Or could that be because of not running it long enough? And also, how can I run this for longer?

That doesn't seem right. The output shall be something like this:

Starting... (2024-04-09 03:49:09)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2024-04-09 03:49:10.
Progress(8) at 2024-04-09 03:49:13: 33,459 states generated (33,459 s/min), 7,725 distinct states found (7,725 ds/min), 5,643 states left on queue.

Could you share the full output?

@MadhavJivrajani
Copy link

MadhavJivrajani commented Apr 9, 2024

@joshuazh-x thanks,

Full output here
❯ ./validate-model.sh -s etcdraft.tla -c etcdraft.cfg
spec: etcdraft.tla
config: etcdraft.cfg
Downloading TLA+ tools ... done.
TLC2 Version 2.18 of Day Month 20?? (rev: d1504b6)
Running breadth-first search Model-Checking with fp 3 and seed -3880762815027339908 with 1 worker on 16 cores with 7282MB heap and 64MB offheap memory [pid: 98807] (Mac OS X 13.6.5 x86_64, Homebrew 21.0.2 x86_64, MSBDiskFPSet, DiskStateQueue).
Parsing file /Users/mjivrajani/gocode/src/github.com/MadhavJivrajani/raft/tla/etcdraft.tla
Parsing file /private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tlc-281771803267607344/Naturals.tla (jar:file:/private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tmp.RtLyjmUa/tool/tla2tools.jar!/tla2sany/StandardModules/Naturals.tla)
Parsing file /private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tlc-281771803267607344/Integers.tla (jar:file:/private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tmp.RtLyjmUa/tool/tla2tools.jar!/tla2sany/StandardModules/Integers.tla)
Parsing file /private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tlc-281771803267607344/Bags.tla (jar:file:/private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tmp.RtLyjmUa/tool/tla2tools.jar!/tla2sany/StandardModules/Bags.tla)
Parsing file /private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tlc-281771803267607344/FiniteSets.tla (jar:file:/private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tmp.RtLyjmUa/tool/tla2tools.jar!/tla2sany/StandardModules/FiniteSets.tla)
Parsing file /private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tlc-281771803267607344/Sequences.tla (jar:file:/private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tmp.RtLyjmUa/tool/tla2tools.jar!/tla2sany/StandardModules/Sequences.tla)
Parsing file /private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tlc-281771803267607344/SequencesExt.tla (jar:file:/private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tmp.RtLyjmUa/tool/CommunityModules-deps.jar!/SequencesExt.tla)
Parsing file /private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tlc-281771803267607344/FiniteSetsExt.tla (jar:file:/private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tmp.RtLyjmUa/tool/CommunityModules-deps.jar!/FiniteSetsExt.tla)
Parsing file /private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tlc-281771803267607344/BagsExt.tla (jar:file:/private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tmp.RtLyjmUa/tool/CommunityModules-deps.jar!/BagsExt.tla)
Parsing file /private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tlc-281771803267607344/TLC.tla (jar:file:/private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tmp.RtLyjmUa/tool/tla2tools.jar!/tla2sany/StandardModules/TLC.tla)
Parsing file /private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tlc-281771803267607344/_TLCTrace.tla (jar:file:/private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tmp.RtLyjmUa/tool/tla2tools.jar!/tla2sany/StandardModules/_TLCTrace.tla)
Parsing file /private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tlc-281771803267607344/Folds.tla (jar:file:/private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tmp.RtLyjmUa/tool/CommunityModules-deps.jar!/Folds.tla)
Parsing file /private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tlc-281771803267607344/Functions.tla (jar:file:/private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tmp.RtLyjmUa/tool/CommunityModules-deps.jar!/Functions.tla)
Parsing file /private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tlc-281771803267607344/TLCExt.tla (jar:file:/private/var/folders/r7/f6kyp7vs0sv90105pjh1t8340000gp/T/tmp.RtLyjmUa/tool/tla2tools.jar!/tla2sany/StandardModules/TLCExt.tla)
Semantic processing of module Naturals
Semantic processing of module Integers
Semantic processing of module Sequences
Semantic processing of module FiniteSets
Semantic processing of module TLC
Semantic processing of module Bags
Semantic processing of module Folds
Semantic processing of module Functions
Semantic processing of module FiniteSetsExt
Semantic processing of module SequencesExt
Semantic processing of module BagsExt
Semantic processing of module TLCExt
Semantic processing of module _TLCTrace
Semantic processing of module etcdraft
Starting... (2024-04-09 11:18:57)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2024-04-09 11:18:57.
Model checking completed. No error has been found.
  Estimates of the probability that TLC did not check all reachable states
  because two distinct states had the same fingerprint:
  calculated (optimistic):  val = 3.3E-19
7 states generated, 1 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 1.
The average outdegree of the complete state graph is 0 (minimum is 0, the maximum 0 and the 95th percentile is 0).
Finished in 01s at (2024-04-09 11:18:57)

@joshuazh-x
Copy link
Contributor Author

@joshuazh-x thanks,

Full output here

I forgot to set its initial configuration. Fix it in the new commit.

BTW MCetcdraft.tla/cfg is recommended for model checking as it sets reasonable boundaries to limit the search in a small scope. And it also includes bootstrapping logic in node.go.

state_trace.go Outdated Show resolved Hide resolved
state_trace_nop.go Outdated Show resolved Hide resolved
raft.go Outdated Show resolved Hide resolved
@@ -761,6 +771,8 @@ func (r *raft) appliedSnap(snap *pb.Snapshot) {
// index changed (in which case the caller should call r.bcastAppend). This can
// only be called in StateLeader.
func (r *raft) maybeCommit() bool {
defer traceCommit(r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit trace here will make state machine step with AdvanceCommitIndex action which contains same logic as that in maybeCommit. We expect state machine has states as that after calling maybeCommit.

  • I do not see the "same logic".
  • Do you mean that you always need to record a rsmCommit event in the trace even if raft doesn't really commit although it called (*raft) maybeCommit()?

tla/README.md Outdated Show resolved Hide resolved
tla/README.md Outdated Show resolved Hide resolved
tla/README.md Outdated Show resolved Hide resolved
tla/README.md Outdated Show resolved Hide resolved
@joshuazh-x joshuazh-x force-pushed the trace-validation branch 2 times, most recently from 6fde2d7 to 12b5a35 Compare April 11, 2024 06:05
tla/etcdraft.tla Outdated Show resolved Hide resolved
tla/README.md Outdated Show resolved Hide resolved
@ahrtr
Copy link
Member

ahrtr commented Apr 17, 2024

Overall looks good to me.

I am Ok to merge this PR once other maintainers have no objection.

After we merge this PR, I suggest to move all the TLA+ specs into a dedicated separate repo, something like raft-spec or raft-tla. The purpose is that we can delegate it to a list of dedicated maintainers to maintain. Based on the discussion above, I see some people have expertise on TLA+. Please feedback if anyone is interested in maintain the new repo. At least we should onboard @joshuazh-x @lemmy

@serathius
Copy link
Member

I am Ok to merge this PR once other maintainers have no objection.

LGTM

After we merge this PR, I suggest to move all the TLA+ specs into a dedicated separate repo, something like raft-spec or raft-tla. The purpose is that we can delegate it to a list of dedicated maintainers to maintain.

Please note that directories can also have dedicated maintainers, let's keep the TLA spec in raft for as long as we are iterating on the test to avoid friction of coordinating between two repos. We can decide later.

@ahrtr
Copy link
Member

ahrtr commented Apr 18, 2024

Please note that directories can also have dedicated maintainers, let's keep the TLA spec in raft for as long as we are iterating on the test to avoid friction of coordinating between two repos. We can decide later.

OK, we can revisit this later. Hopefully eventually the TLA spec maintainer should have permission to merge the change on the spec independently.

@ahrtr
Copy link
Member

ahrtr commented Apr 18, 2024

LGTM

Thanks for the great work!

@ahrtr ahrtr merged commit 5a610f6 into etcd-io:main Apr 18, 2024
10 checks passed
@MadhavJivrajani
Copy link

So happy to see this merged, thank you @joshuazh-x for your work!

@felipecrv
Copy link

This is a great demonstration of TLA+ "real world usage". I hope it gets more publicized and mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.