RFC: Prop API and testable values #397

kurtschelfthout · 2017-09-24T20:23:42Z

I propose to significantly simplify the Prop API and what FsCheck considers to be testable values.

For one, FsCheck can currently check functions that are "testable", which is to say they return unit or void, bool, Property, Lazy<T>, Func<TArb,T> where T is testable. On the fscheck3 branch we added Task<T>. (There are also a bunch of other types that are really only there to make the implementation work.)

The API on Prop is basically composed of:

forAll functions and overloads that take an Arbitrary and a function to test.
value distribution functions like trivial, collect which show how values are distributed on successful test
assertion functions like within
combination functions like .&. (and) and .|. (or)
label function to show a string on a failing test.

In short, my suggestion would be to simplify this API away to almost nothing - basically to just Prop.forAll with different functions/overloads for a number of arities and return types.Failure is signaled only via throwing exceptions and so functions returning bool etc are not allowed. Even Property would be disallowed.

Some reasoning and ramifications:

Everything would be clear from the signature and type safe - no more magical Testable generic type.
- Prop.forAll: Arbitrary<T> -> (T -> unit) -> Property
- Prop.forAllAsync: Arbitrary<T> -> (T -> Task) -> Property
- Prop.forAllFsAsync: Arbitrary<T> -> (T -> Async<unit>) -> Property
All assertion frameworks already signal errors via exceptions, so unit is natural type to have as a return type. Also the advantage of signalling failure via an exception is manifold:
- As opposed to an assertion like a = b where the result is false, with exceptions you can add an error message like "Expected a, got b". So this essentially makes label superfluous.
- Combining multiple assertions is easy - just write one after the other - and again the context is easy to derive from the exception's stacktrace. This make .&. and .|. operators superfluous.
Remove ==> because it is confusing - because of eager evaluation the RHS gets evaluated regardless of whether the LHS succeeds or not. It is also the main reason why Lazy is a supported testable type - so you can use ... ==> lazy (...). Instead, filtering of the values to be tested should happen on the Arbitrary side which perhaps looks slightly less nice - and I'm not even sure about that - but works much better in practice. Moving discarded values - i.e. that don't match the filter - to earlier in the chain like in Gen.where and Gen.filter would also allow FsCheck to warn in all cases when a particular filter is too strict. Right now this warning only works if you use ==>, not with the methods on Gen.
within and throws are really not FsCheck's core business - let's point users to a better alternative in other libraries like Unquote, xUnit.NET etc.
trivial, classify and collect should be replaced by making the full list of generated objects available as the output of an FsCheck test run, so that any analysis of the generated values can happen afterwards, and perhaps using more powerful tools than just text output (e.g. draw a histogram). This also allows other scenarios like storing the values somewhere for later reproduction/analysis etc.
Removing Property as an allowed return type essentially removes the ability to write nested Prop.forAll calls. I think this imposes a stricter separation of value generation and testing, which doesn't take away actual expressive power, but imposes some syntactic limitations that could be seen as a nuisance but are long-term and overall beneficial. It also makes setting up a test very uniform: a Prop.forAll call taking one or more Arbitrary instances, a test function, and optionally some post processing and analysis of the generated values. However this is the most controversial aspect of the suggestion. I have a feeling it actually does take away expressive power, will have to try.
I think we'd keep discard. Even though it is not used often, it represents a legitimate use case.

The text was updated successfully, but these errors were encountered:

Rattenkrieg · 2017-09-24T20:57:37Z

... and so functions returning bool etc are not allowed

IMHO stuff like this is a show stoper (first example from docs)

let revRevIsOrig (xs:list<int>) = List.rev(List.rev xs) = xs
Check.Quick revRevIsOrig

having it like that

let revRevIsOrig (xs:list<int>) = if List.rev(List.rev xs) = xs then () else failwith "blahblah"
Check.Quick revRevIsOrig

would be harder to sell it to new users.
I understand that concern is to keep API surface manageable and portable (like burden of interop with test frameworks). But that magic of defining properties in terms of boolean functions is the essence of the property based testing.
I believe none of QuickChecks ports have dropped T -> bool API.

kurtschelfthout · 2017-09-24T22:20:06Z

Could be:

let revRevIsOrig (xs:list<int>) = if List.rev(List.rev xs) <> xs then failwith "blahblah"

Or with an assertion framework like Unquote you get brevity and the property back, and better error messages:

let revRevIsOrig (xs:list<int>) = List.rev(List.rev xs) =! xs
let revRevIsOrig (xs:list<int>) = Assert.Equals(xs, List.rev(List.rev xs))

Overall I've come to dislike returning bools since it doesn't scale beyond this very simple example. When a test that returns bool fails, it gives me zero helpful information. Especially since I tend to combine using && and || and so have no clue even which assertion actually fails.

EDIT: And btw, it's literally a one liner to add "support" for bool return types back:

let assert p = if not p then failwith "false"

Note I've exactly simulated the assertion failure message as well ;)

Alxandr · 2017-09-25T06:43:41Z

Yep, or with unqote:

let revRevIsOrig (xs: int list) = test <@ List.rev(List.rev xs) = xs @>

On a side note. I've had nested properties before (translated a library from haskell that used quickcheck), which I don't know how I would have gotten around. I don't remember the exact usecase, but it was something to the tune of this (note this is a trivial example which somewhat illustrates the need):

type Vector3D = { x: int; y: int; z: int }
type Operation = Add | Multiply

let test (op: Operation) =
  match op with
  | Add -> Prop.forAll(fun (v1: Vector3D) (v2: Vector3D) -> check stuffs)
  | Multiply -> Prop.forAll(fun (v: Vector3D) (n: int) -> check other stuffs)

Now, in this trivial example, it would be easy to split them apart, but I remember not being able to figure a clean way to do that with the test I had back then. Sorry it's vague, but it's been a while :-/.

That being said, if you used a state monad for the tests, you could make sure that nested forAll calls does not make the test a n*n, but just n, by having nested forAll calls just do a single itteration. Cause it will always come back again, based on the configuration of number of iteration at the top level. So in my case it would generate Add and Multiply 500 times each (roughly), and generate the sets of vectors or vector + scalar 500 times each if I set 1000 iterations per test.

kurtschelfthout · 2017-09-25T10:38:21Z

Now, in this trivial example, it would be easy to split them apart, but I remember not being able to figure a clean way to do that with the test I had back then. Sorry it's vague, but it's been a while

Usually it's used as a somewhat convenient way to parametrize an Arbitrary instance according to the result of an outer Arbitrary or some calculation in the test (or even some earlier test result). Note it also allows recursive properties. So yes agree there are some usages here.

That being said, if you used a state monad for the tests, you could make sure that nested forAll calls does not make the test a n*n, but just n

That is already the case.

ploeh · 2017-10-08T19:18:56Z

Some of this sounds fine to me; some of it concerns me a bit. I'm not going to explicitly call out places where I agree, because I agree with much of it, and there's no reason to reiterate it all. On the other hand, I have some concerns:

On simpler return types

Using only exceptions to signal failures doesn't sound functional to me. I agree that currently, all .NET unit testing assertion libraries tend to rely on exceptions, but that's mostly for lack of better options - historically.

Now that the F# core library has an official Result (Either) type, I'd consider it a better, and more functional, design, if libraries like e.g. Unquote started offering an API based on that instead of exceptions.

So, according to that, I think that Result<unit, string>, at least, should be a valid return type, but then that clearly also includes Async<Result<unit, string>>, as well as Task<Result<unit, string>>...

On removing `==>`

I agree that too much reliance on ==> is a smell, and I wouldn't be surprised if people constantly run into trouble with it. It took me some time, as well, before I understand why lazy was required.

Occasionally, I run into a proper use case for the operator. Here's an example: http://blog.ploeh.dk/2016/01/18/make-pre-conditions-explicit-in-property-based-tests

Still, I run into such scenarios rarely, so I could probably live without the operator...

Nesting

Don't the glue libraries (at least, FsCheck.Xunit) make use of nested properties? For example, if you consider this property:

[<Property(QuietOnSuccess = true)>]
let ``Any live cell with > 3 live neighbors dies`` (cell : int * int) =
    Gen.elements [4..8]
    |> Arb.fromGen
    |> Prop.forAll <| fun neighborCount ->
        let liveNeighbors =
            cell
            |> findNeighbors
            |> shuffle
            |> Seq.take neighborCount
            |> Seq.toList
        
        let actual : State =
            calculateNextState (cell :: liveNeighbors |> shuffle |> set) cell
 
        Dead =! actual

Notice that the cell comes from the test function argument, since any combination of two integers is useful. On the other hand, it composes this with another property that constrains how many neighbours to generate.

I do know that I could pull cell down into the body of the function as well, but that would require more plumbing to compose two generators - particularly if I'd want to retain shrinking of the cell.

kurtschelfthout · 2017-10-09T10:09:07Z

Using only exceptions to signal failures doesn't sound functional to me. I agree that currently, all .NET unit testing assertion libraries tend to rely on exceptions, but that's mostly for lack of better options - historically.

I know where you're coming from with the functional comment. It's exactly what I believed. But now I believe that actually, exceptions are the best option, and not even in a "least worst" kind of way. Exceptions are a good way to implement assertions because they allow you to break early, they add context (stacktrace and message), and compose easily (no workflow or monad needed).

Then there is overwhelming ecosystem support in the CLR obviously, but also in test runners (e.g. click on stacktrace to go to code).

See also: https://eiriktsarpalis.wordpress.com/2017/02/19/youre-better-off-using-exceptions/

kurtschelfthout · 2017-10-09T10:20:25Z

Occasionally, I run into a proper use case for the operator. Here's an example: http://blog.ploeh.dk/2016/01/18/make-pre-conditions-explicit-in-property-based-tests

Yes... there would have to be an Arb.where which is pretty trivial to add - and example would end up looking like.

let ``Validate.reservation returns right result on invalid date``() =
    Prop.forAll (Arb.from<ReservationRendition> |> Arb.where (fun r -> not(fst(DateTimeOffset.TryParse rendition.Date)))) (fun rendition ->

    let actual = Validate.reservation rendition 
    let expected : Result<Reservation, Error> =
        Failure(ValidationError("Invalid date."))
    test <@ expected = actual @>)

Which is somehow somewhat less pleasing to the eye, but overall seems to move things more where they belong.

This would have the advantage that discarding values gets pushed into Gen (where it belongs really) so that combinators like Gen.where can also count and warn on too many discarded values if they wish.

One risk here is that people write things like the below in their actual test

if <precondition> then
    <actual test>

which would carry the risk that the test maybe becomes slow for unclear reasons because too many discarded values, and there is no way for FsCheck to detect this.

kurtschelfthout · 2017-10-09T10:22:44Z

I do know that I could pull cell down into the body of the function as well, but that would require more plumbing to compose two generators - particularly if I'd want to retain shrinking of the cell.

Yes, this is a major downside. The change to go from a simple test (maybe a first version without the nested property) to a more "advanced" one would be very discontinuous - i.e. a relatively simple addition with nested properties, would involve a more significant refactoring, with more plumbing as you describe.

ploeh · 2017-10-09T11:48:09Z

See also: https://eiriktsarpalis.wordpress.com/2017/02/19/youre-better-off-using-exceptions/

Yes, I know of that blog post, and I also left a comment there stating my position. It hasn't changed since then.

.NET exceptions are good for some scenarios, but I think it'd be a shame to write off the ability to use Either.

kurtschelfthout · 2017-10-09T20:55:42Z

I personally don't think that using bool or Result is a good idea except in very specific circumstances, of which tests isn't one. BUT, one of my (few) principles in developing FsCheck: don't be (too) opinionated. (I figure there are already enough people with opinions shrug ) In other words, if people like this, then why not?

So, I hereby tone down this proposal to:

take out testability of tuples and lists (doubt many people know you can actually do this anyway...)
take out within and throws.

I hope this is uncontroversial 😄

For ==> another idea to deal with the need for lazy on the RHS is to automatically add an explanatory text when using it with a non-lazy RHS, if the test fails. Something like "We noticed that you're using ==> with a right hand side that is not lazy. Note that the right hand side is still run, even if the precondition does not hold. This can cause your test to fail for values for which the precondition does not hold. Consider using <pre> ==> lazy( <property> ) instead." (and I don't even need to wait for v3 to do this...)

Alxandr · 2017-10-10T09:22:20Z

How about just having it be ==> fun () -> true = true. That's mostly in line with what similar operators does, no?

[Edit]
Basically, type it as bool ==> (unit -> 'testable)

kurtschelfthout · 2017-10-12T19:44:59Z

type it as bool ==> (unit -> 'testable)

Hmm, yeah. Is it worth inflicting the conservative choice on everyone to save some confusion...too close to call?

See #397.

kurtschelfthout · 2017-10-13T08:52:23Z

I am going to close this, as it's been decided that this will stay more or less as is and I've applied the smaller changes that were going through. There are some question marks around ==> but I think those are relatively minor. Also, given #403 the API here might change - e.g. it's not inconceivable that ==> and Gen.where and Arb.filter would become one and the same thing, which would be nice.

Any further thoughts on this feel free to keep commenting or open a separate issue.

kurtschelfthout added the vNext label Sep 24, 2017

kurtschelfthout mentioned this issue Sep 24, 2017

FsCheck turns 3.0? #198

Closed

8 tasks

kurtschelfthout added a commit that referenced this issue Oct 12, 2017

Remove Prop.within and Timeout outcome.

9c2dbda

See #397.

kurtschelfthout added a commit that referenced this issue Oct 12, 2017

Remove tuple and list testables.

b5f1063

See #397.

kurtschelfthout closed this as completed Oct 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Prop API and testable values #397

RFC: Prop API and testable values #397

kurtschelfthout commented Sep 24, 2017

Rattenkrieg commented Sep 24, 2017

kurtschelfthout commented Sep 24, 2017 •

edited

Loading

Alxandr commented Sep 25, 2017 •

edited

Loading

kurtschelfthout commented Sep 25, 2017 •

edited

Loading

ploeh commented Oct 8, 2017

kurtschelfthout commented Oct 9, 2017

kurtschelfthout commented Oct 9, 2017

kurtschelfthout commented Oct 9, 2017 •

edited

Loading

ploeh commented Oct 9, 2017

kurtschelfthout commented Oct 9, 2017

Alxandr commented Oct 10, 2017 •

edited

Loading

kurtschelfthout commented Oct 12, 2017

kurtschelfthout commented Oct 13, 2017

RFC: Prop API and testable values #397

RFC: Prop API and testable values #397

Comments

kurtschelfthout commented Sep 24, 2017

Rattenkrieg commented Sep 24, 2017

kurtschelfthout commented Sep 24, 2017 • edited Loading

Alxandr commented Sep 25, 2017 • edited Loading

kurtschelfthout commented Sep 25, 2017 • edited Loading

ploeh commented Oct 8, 2017

On simpler return types

On removing ==>

Nesting

kurtschelfthout commented Oct 9, 2017

kurtschelfthout commented Oct 9, 2017

kurtschelfthout commented Oct 9, 2017 • edited Loading

ploeh commented Oct 9, 2017

kurtschelfthout commented Oct 9, 2017

Alxandr commented Oct 10, 2017 • edited Loading

kurtschelfthout commented Oct 12, 2017

kurtschelfthout commented Oct 13, 2017

kurtschelfthout commented Sep 24, 2017 •

edited

Loading

Alxandr commented Sep 25, 2017 •

edited

Loading

kurtschelfthout commented Sep 25, 2017 •

edited

Loading

On removing `==>`

kurtschelfthout commented Oct 9, 2017 •

edited

Loading

Alxandr commented Oct 10, 2017 •

edited

Loading