Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Prop API and testable values #397

Closed
kurtschelfthout opened this issue Sep 24, 2017 · 13 comments
Closed

RFC: Prop API and testable values #397

kurtschelfthout opened this issue Sep 24, 2017 · 13 comments
Labels

Comments

@kurtschelfthout
Copy link
Member

I propose to significantly simplify the Prop API and what FsCheck considers to be testable values.

For one, FsCheck can currently check functions that are "testable", which is to say they return unit or void, bool, Property, Lazy<T>, Func<TArb,T> where T is testable. On the fscheck3 branch we added Task<T>. (There are also a bunch of other types that are really only there to make the implementation work.)

The API on Prop is basically composed of:

  • forAll functions and overloads that take an Arbitrary and a function to test.
  • value distribution functions like trivial, collect which show how values are distributed on successful test
  • assertion functions like within
  • combination functions like .&. (and) and .|. (or)
  • label function to show a string on a failing test.

In short, my suggestion would be to simplify this API away to almost nothing - basically to just Prop.forAll with different functions/overloads for a number of arities and return types.Failure is signaled only via throwing exceptions and so functions returning bool etc are not allowed. Even Property would be disallowed.

Some reasoning and ramifications:

  • Everything would be clear from the signature and type safe - no more magical Testable generic type.
    • Prop.forAll: Arbitrary<T> -> (T -> unit) -> Property
    • Prop.forAllAsync: Arbitrary<T> -> (T -> Task) -> Property
    • Prop.forAllFsAsync: Arbitrary<T> -> (T -> Async<unit>) -> Property
  • All assertion frameworks already signal errors via exceptions, so unit is natural type to have as a return type. Also the advantage of signalling failure via an exception is manifold:
    • As opposed to an assertion like a = b where the result is false, with exceptions you can add an error message like "Expected a, got b". So this essentially makes label superfluous.
    • Combining multiple assertions is easy - just write one after the other - and again the context is easy to derive from the exception's stacktrace. This make .&. and .|. operators superfluous.
  • Remove ==> because it is confusing - because of eager evaluation the RHS gets evaluated regardless of whether the LHS succeeds or not. It is also the main reason why Lazy is a supported testable type - so you can use ... ==> lazy (...). Instead, filtering of the values to be tested should happen on the Arbitrary side which perhaps looks slightly less nice - and I'm not even sure about that - but works much better in practice. Moving discarded values - i.e. that don't match the filter - to earlier in the chain like in Gen.where and Gen.filter would also allow FsCheck to warn in all cases when a particular filter is too strict. Right now this warning only works if you use ==>, not with the methods on Gen.
  • within and throws are really not FsCheck's core business - let's point users to a better alternative in other libraries like Unquote, xUnit.NET etc.
  • trivial, classify and collect should be replaced by making the full list of generated objects available as the output of an FsCheck test run, so that any analysis of the generated values can happen afterwards, and perhaps using more powerful tools than just text output (e.g. draw a histogram). This also allows other scenarios like storing the values somewhere for later reproduction/analysis etc.
  • Removing Property as an allowed return type essentially removes the ability to write nested Prop.forAll calls. I think this imposes a stricter separation of value generation and testing, which doesn't take away actual expressive power, but imposes some syntactic limitations that could be seen as a nuisance but are long-term and overall beneficial. It also makes setting up a test very uniform: a Prop.forAll call taking one or more Arbitrary instances, a test function, and optionally some post processing and analysis of the generated values. However this is the most controversial aspect of the suggestion. I have a feeling it actually does take away expressive power, will have to try.
  • I think we'd keep discard. Even though it is not used often, it represents a legitimate use case.
@Rattenkrieg
Copy link
Contributor

... and so functions returning bool etc are not allowed

IMHO stuff like this is a show stoper (first example from docs)

let revRevIsOrig (xs:list<int>) = List.rev(List.rev xs) = xs
Check.Quick revRevIsOrig

having it like that

let revRevIsOrig (xs:list<int>) = if List.rev(List.rev xs) = xs then () else failwith "blahblah"
Check.Quick revRevIsOrig

would be harder to sell it to new users.
I understand that concern is to keep API surface manageable and portable (like burden of interop with test frameworks). But that magic of defining properties in terms of boolean functions is the essence of the property based testing.
I believe none of QuickChecks ports have dropped T -> bool API.

@kurtschelfthout
Copy link
Member Author

kurtschelfthout commented Sep 24, 2017

Could be:

let revRevIsOrig (xs:list<int>) = if List.rev(List.rev xs) <> xs then failwith "blahblah"

Or with an assertion framework like Unquote you get brevity and the property back, and better error messages:

let revRevIsOrig (xs:list<int>) = List.rev(List.rev xs) =! xs
let revRevIsOrig (xs:list<int>) = Assert.Equals(xs, List.rev(List.rev xs))

Overall I've come to dislike returning bools since it doesn't scale beyond this very simple example. When a test that returns bool fails, it gives me zero helpful information. Especially since I tend to combine using && and || and so have no clue even which assertion actually fails.

EDIT: And btw, it's literally a one liner to add "support" for bool return types back:

let assert p = if not p then failwith "false"

Note I've exactly simulated the assertion failure message as well ;)

@Alxandr
Copy link

Alxandr commented Sep 25, 2017

Yep, or with unqote:

let revRevIsOrig (xs: int list) = test <@ List.rev(List.rev xs) = xs @>

On a side note. I've had nested properties before (translated a library from haskell that used quickcheck), which I don't know how I would have gotten around. I don't remember the exact usecase, but it was something to the tune of this (note this is a trivial example which somewhat illustrates the need):

type Vector3D = { x: int; y: int; z: int }
type Operation = Add | Multiply

let test (op: Operation) =
  match op with
  | Add -> Prop.forAll(fun (v1: Vector3D) (v2: Vector3D) -> check stuffs)
  | Multiply -> Prop.forAll(fun (v: Vector3D) (n: int) -> check other stuffs)

Now, in this trivial example, it would be easy to split them apart, but I remember not being able to figure a clean way to do that with the test I had back then. Sorry it's vague, but it's been a while :-/.

That being said, if you used a state monad for the tests, you could make sure that nested forAll calls does not make the test a n*n, but just n, by having nested forAll calls just do a single itteration. Cause it will always come back again, based on the configuration of number of iteration at the top level. So in my case it would generate Add and Multiply 500 times each (roughly), and generate the sets of vectors or vector + scalar 500 times each if I set 1000 iterations per test.

@kurtschelfthout
Copy link
Member Author

kurtschelfthout commented Sep 25, 2017

Now, in this trivial example, it would be easy to split them apart, but I remember not being able to figure a clean way to do that with the test I had back then. Sorry it's vague, but it's been a while

Usually it's used as a somewhat convenient way to parametrize an Arbitrary instance according to the result of an outer Arbitrary or some calculation in the test (or even some earlier test result). Note it also allows recursive properties. So yes agree there are some usages here.

That being said, if you used a state monad for the tests, you could make sure that nested forAll calls does not make the test a n*n, but just n

That is already the case.

@ploeh
Copy link
Member

ploeh commented Oct 8, 2017

Some of this sounds fine to me; some of it concerns me a bit. I'm not going to explicitly call out places where I agree, because I agree with much of it, and there's no reason to reiterate it all. On the other hand, I have some concerns:

On simpler return types

Using only exceptions to signal failures doesn't sound functional to me. I agree that currently, all .NET unit testing assertion libraries tend to rely on exceptions, but that's mostly for lack of better options - historically.

Now that the F# core library has an official Result (Either) type, I'd consider it a better, and more functional, design, if libraries like e.g. Unquote started offering an API based on that instead of exceptions.

So, according to that, I think that Result<unit, string>, at least, should be a valid return type, but then that clearly also includes Async<Result<unit, string>>, as well as Task<Result<unit, string>>...

On removing ==>

I agree that too much reliance on ==> is a smell, and I wouldn't be surprised if people constantly run into trouble with it. It took me some time, as well, before I understand why lazy was required.

Occasionally, I run into a proper use case for the operator. Here's an example: http://blog.ploeh.dk/2016/01/18/make-pre-conditions-explicit-in-property-based-tests

Still, I run into such scenarios rarely, so I could probably live without the operator...

Nesting

Don't the glue libraries (at least, FsCheck.Xunit) make use of nested properties? For example, if you consider this property:

[<Property(QuietOnSuccess = true)>]
let ``Any live cell with > 3 live neighbors dies`` (cell : int * int) =
    Gen.elements [4..8]
    |> Arb.fromGen
    |> Prop.forAll <| fun neighborCount ->
        let liveNeighbors =
            cell
            |> findNeighbors
            |> shuffle
            |> Seq.take neighborCount
            |> Seq.toList
        
        let actual : State =
            calculateNextState (cell :: liveNeighbors |> shuffle |> set) cell
 
        Dead =! actual

Notice that the cell comes from the test function argument, since any combination of two integers is useful. On the other hand, it composes this with another property that constrains how many neighbours to generate.

I do know that I could pull cell down into the body of the function as well, but that would require more plumbing to compose two generators - particularly if I'd want to retain shrinking of the cell.

@kurtschelfthout
Copy link
Member Author

Using only exceptions to signal failures doesn't sound functional to me. I agree that currently, all .NET unit testing assertion libraries tend to rely on exceptions, but that's mostly for lack of better options - historically.

I know where you're coming from with the functional comment. It's exactly what I believed. But now I believe that actually, exceptions are the best option, and not even in a "least worst" kind of way. Exceptions are a good way to implement assertions because they allow you to break early, they add context (stacktrace and message), and compose easily (no workflow or monad needed).

Then there is overwhelming ecosystem support in the CLR obviously, but also in test runners (e.g. click on stacktrace to go to code).

See also: https://eiriktsarpalis.wordpress.com/2017/02/19/youre-better-off-using-exceptions/

@kurtschelfthout
Copy link
Member Author

Occasionally, I run into a proper use case for the operator. Here's an example: http://blog.ploeh.dk/2016/01/18/make-pre-conditions-explicit-in-property-based-tests

Yes... there would have to be an Arb.where which is pretty trivial to add - and example would end up looking like.

let ``Validate.reservation returns right result on invalid date``() =
    Prop.forAll (Arb.from<ReservationRendition> |> Arb.where (fun r -> not(fst(DateTimeOffset.TryParse rendition.Date)))) (fun rendition ->

    let actual = Validate.reservation rendition 
    let expected : Result<Reservation, Error> =
        Failure(ValidationError("Invalid date."))
    test <@ expected = actual @>)

Which is somehow somewhat less pleasing to the eye, but overall seems to move things more where they belong.

This would have the advantage that discarding values gets pushed into Gen (where it belongs really) so that combinators like Gen.where can also count and warn on too many discarded values if they wish.

One risk here is that people write things like the below in their actual test

if <precondition> then
    <actual test>

which would carry the risk that the test maybe becomes slow for unclear reasons because too many discarded values, and there is no way for FsCheck to detect this.

@kurtschelfthout
Copy link
Member Author

kurtschelfthout commented Oct 9, 2017

I do know that I could pull cell down into the body of the function as well, but that would require more plumbing to compose two generators - particularly if I'd want to retain shrinking of the cell.

Yes, this is a major downside. The change to go from a simple test (maybe a first version without the nested property) to a more "advanced" one would be very discontinuous - i.e. a relatively simple addition with nested properties, would involve a more significant refactoring, with more plumbing as you describe.

@ploeh
Copy link
Member

ploeh commented Oct 9, 2017

See also: https://eiriktsarpalis.wordpress.com/2017/02/19/youre-better-off-using-exceptions/

Yes, I know of that blog post, and I also left a comment there stating my position. It hasn't changed since then.

.NET exceptions are good for some scenarios, but I think it'd be a shame to write off the ability to use Either.

@kurtschelfthout
Copy link
Member Author

I personally don't think that using bool or Result is a good idea except in very specific circumstances, of which tests isn't one. BUT, one of my (few) principles in developing FsCheck: don't be (too) opinionated. (I figure there are already enough people with opinions shrug ) In other words, if people like this, then why not?

So, I hereby tone down this proposal to:

  • take out testability of tuples and lists (doubt many people know you can actually do this anyway...)
  • take out within and throws.

I hope this is uncontroversial 😄

For ==> another idea to deal with the need for lazy on the RHS is to automatically add an explanatory text when using it with a non-lazy RHS, if the test fails. Something like "We noticed that you're using ==> with a right hand side that is not lazy. Note that the right hand side is still run, even if the precondition does not hold. This can cause your test to fail for values for which the precondition does not hold. Consider using <pre> ==> lazy( <property> ) instead." (and I don't even need to wait for v3 to do this...)

@Alxandr
Copy link

Alxandr commented Oct 10, 2017

How about just having it be ==> fun () -> true = true. That's mostly in line with what similar operators does, no?

[Edit]
Basically, type it as bool ==> (unit -> 'testable)

@kurtschelfthout
Copy link
Member Author

type it as bool ==> (unit -> 'testable)

Hmm, yeah. Is it worth inflicting the conservative choice on everyone to save some confusion...too close to call?

kurtschelfthout added a commit that referenced this issue Oct 12, 2017
kurtschelfthout added a commit that referenced this issue Oct 12, 2017
@kurtschelfthout
Copy link
Member Author

I am going to close this, as it's been decided that this will stay more or less as is and I've applied the smaller changes that were going through. There are some question marks around ==> but I think those are relatively minor. Also, given #403 the API here might change - e.g. it's not inconceivable that ==> and Gen.where and Arb.filter would become one and the same thing, which would be nice.

Any further thoughts on this feel free to keep commenting or open a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants