-
-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the Transformer#apply
for intuitive implicit summoning
#594
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #594 +/- ##
==========================================
- Coverage 86.57% 86.56% -0.01%
==========================================
Files 154 154
Lines 5995 6007 +12
Branches 544 543 -1
==========================================
+ Hits 5190 5200 +10
- Misses 805 807 +2 ☔ View full report in Codecov by Sentry. |
* | ||
* @since 1.5.0 | ||
*/ | ||
def apply[From, To](implicit t: Transformer[From, To]): Transformer[From, To] = t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's obvious that having a counterpart method for PartialTransformer
would be beneficial. However, since there is already a defined method PartialTransformer#apply
, adding the counterpart method leads to a rather strange issue in Scala 2 (although it works well in Scala 3). See the minimized repro in Scastie — https://scastie.scala-lang.org/danicheg/eM7sBvzAS0q66azXmvayfg/6. Unfortunately, I haven't come up with a workaround yet, so I've left it as is.
Well, to be honest we didn't add this
Chimney does the opposite.
Chimney's philosophy is also more on the side of Jsoniter Scala than Circe:
so any need to do
In cases like the example it would make more sense to introduce some "higher-kinded Transformer" (something necessary for e.g. #569) which would handle external transformation as long as macro would handle the internal transformation, so something like: package integrations
// this could support e.g. `withFieldConst(_.everyItem, ...)`
trait TotallyMappable[From, To, InnerFrom, InnerTo] {
def transform(src: From, f: InnerFrom => InnerTo): To
} // would allow withFieldConst(_.everyItem, 0) if necessary
given TotallyMappable[ID, Int, String, Int] with {
def transform(src: ID, f: String => Int): To = f(src.value)
} would potentially work much more reliably. I'd be reluctant to promote this summon/map/contramap style of defining I would probably let it pass if it was consistent:
If PartialTransformer |
I kind of hope for more reasoning like this in Scala's ecosystems - stability and consistency first. 👏 |
Wow, I didn't expect such a strong reaction to that inoffensive PR. Let me clarify my reasoning on this matter:
Going further, I'd like to discuss the approach of having defined common instances of
I understand this might be considered an 'Argumentum ad verecundiam,' but with over eight years of experience working with Scala, I have optimized compilation times for various codebases ranging from 50k to 1M LOC across different companies many times. I've even contributed to the Scala ecosystem by reviving a tool for profiling the compilation times of Scala 2 projects — https://github.com/scalacenter/scalac-profiling/. Given all of my experience with Scala Macro and automated derivations, I have a STRONG OPINION that there is no worse advice from upstream libraries than to use auto-derivation to the maximum extent, without paying any attention to the consequences. The negative impact on DX in projects larger than 10 KLOC, plagued by automated recursive derivations of type class instances by Macros, without consideration of the underlying costs, is significant and frustrating. This approach heavily contributes to the perception that "Scala is slow" or that "compilation in Scala is slow" in the broader community. Therefore, I would really like to hear @lbialy's perspective on this matter. What are we aiming to achieve? Higher velocity in writing code, or ensuring that large enterprise projects can thrive over the long term? To elaborate a bit, earlier this year, I worked on optimizing the compilation times of a quite mature and large (> 0.5MLoC) monorepo at $work, which employed automated recursive derivations of instances of certain type classes. I’m not sure if you've encountered something similar, but I was genuinely surprised to find that the derivation of an instance of
Honestly, I'm not seeing how that point relates to the object CommonTransformers extends ProtobufTransformerImplicits {
implicit val foo: Transformer[Domain, Duration] =
Transformer[protobuf.Duration, scala.Duration].contramap(_.value)
} vs object CommonTransformers extends ProtobufTransformerImplicits {
implicit val foo: Transformer[Domain, Duration] =
(duration: Domain) => totalTransformerFromDurationToJavaDurationInstance.transform(duration.value)
} |
If I could ask what did you find to be a strong reaction in Mateusz's post, I feel it would help us understand each other without unnecessary tensions. I do understand your point @danicheg but I think the core point of Mateusz's reasoning is is that library authors want to manage how the library is being used by incentives and ergonomics of the API. Missing summoning apply is such a hint - it's meant to steer the user towards the recursively deriving macro. This in turn is done because Chimney is usually used in hot paths and is expected to have performance comparable with hand-written, manual data transfer functions. Recursive derivation in macro allows Chimney to select speedier implementations that are invalidated by pre-existing instances of I do indeed think of the DX in larger projects given that I have mostly worked on larger projects. Maybe Chimney docs should have a section regarding what to do when the impact on compile times is noticeable (which is to move towards semi-auto in large datatypes probably, I'll leave this to Mateusz who probably understands the problem better). My comment regarding stability and consistency was not in regard to the wider ecosystem but to the fact that Chimney has had a 1.0 release and is now considered to be a stable library with a well thought-out approach to do things and I do admire the resolve not to change things for the end user after 1.0 release as I feel we have had and still have enough breaking changes (even if they are just new ways of doing things!) in our community, wouldn't you agree? I've checked whether your PR was preceded by a discussion on whether it's something that library maintainers would want and I can't see anything. Did you hold a conversation about this on some discord perchance? |
I am perfectly aware why recursive autoderivation is bad - I advised against it in the past myself! It spawns thousands intermediate type classes, which needs to be instantiated, which kills the runtime, explodes the amount of classes that needs to be read by the classloader, all of them go into implicit scope expanding the time needed for compilation etc. It's a mess for final performance of the generated code and is a heavy workload for the compiler. It is also absolutely not how Chimney work. The best way I can show it is through this Scaste example (please, hover over the line to see how we derived the code and what was the result): case class FooInner(a: Int, b: String)
case class FooOuter(inner: FooInner)
case class BarInner(a: Int, b: String)
case class BarOuter(inner: BarInner)
import io.scalaland.chimney.Transformer
import io.scalaland.chimney.dsl.*
import scala.compiletime.codeOf
transparent inline given TransformerConfiguration[?] =
TransformerConfiguration.default.enableMacrosLogging
// 1 new Transformer
FooOuter(FooInner(10, "bb")).transformInto[BarOuter]
// 0 new Transformers
FooOuter(FooInner(10, "bb")).into[BarOuter].transform
locally {
given Transformer[FooInner, BarInner] = Transformer.derive
// 2 new Transformers
FooOuter(FooInner(10, "bb")).transformInto[BarOuter]
// 1 new Transformers
FooOuter(FooInner(10, "bb")).into[BarOuter].transform
} One of our goals ever since we rewrote the code from Shapeless into macros was performance. It was mostly about runtime performance, but on 0.8.0 we changed how implicit search works which should also improve the compilation times:
I can confidently say that, aside from Jsoniter Scala, Chimney is the only library in Scala ecosystem which tries so hard to avoid allocations and intermediate type classes. No internal anonymous type classes that nobody asked for (only the one you asked for via (There is only 1 overhead that Chimney has that we are aware of - the DSL used as a builder does have to collect the overrides as runtimes values stored in a Vector, and we have to build it and extract from it). There is only 1 thing that kills this inlining and forces the macro engine to call object Foo {
implicit val foo = Transformer.derive[A, B]
implicit val foo = Transformer.derive[A1, B1]
implicit val foo = Transformer.derive[A2, B2]
} is the best way to kill the optimizations. We put a lot of work so that all the issues present with Circe autoderivation did not carried into Chimney.
Initial version of Chimney was developed using Shapeless. I had an example when In virtually every larger code base I was able to improve things tremendously by replacing Shapeless-based derivation with Magnolia-based derivation, it helped much more than replacing autoderivation with semiautomatic derivation everywhere (that helped more with making sure that there are no 2 derivations for the same types with different behavior). Every performance related issue with derivation with Shapeless originated in the fact that: derivation implementer has to convert between case class/sealed hierarchy and None of that is true if recursion is performed within macro, with macro actively trying to avoid calling itself. But it's probably done this way only via a handful of libraries, since Shapeless-based approach was way to easy compared to complex macro maintenance.
is perfectly fine. However, introducing an API changes which would make people think that Chimney works like Circe - when it does not - would make them swamp us with issues like:
which cannot be avoided by writing more docs, because nobody reads the docs and instead let the API encourage or discourage them from certain things. This is what we do - we hint that the proper way to use Chimney is to let macro deal with 99% of cases. And we want to discourage people from using
It is our opinion as well. However, it will be our problem, to explain to every user:
because translating blindly conventions from Circe or other Shapeless/Magnolia/Mirror-based libraries which borrowed Circe's approach (which are very convenient but also tripped over countless people from their inception) might easily backfire. |
Well put together. 👏 Indeed, the "prejudice" Chimney had to address in the past was:
Additionally, as far as I can say, quite a lot of our users are are on Spark (this I know through private conversations), and very few use Cats module. These stats don't show it perfectly, but one can still see that while Cats integration is the most popular one... it's used by less than 10% of our users. Most of them need only very basic features
The scraps of information I could get my hands on, paint a picture of userbase which uses Chimney for ad-hoc transformations, where they only provide
We have a section about performance impact of default way of derivation with |
Wow, it was a great decision to open this pull request — so many interesting insights into the Chimney development and its evolution. Thanks for writing all of this. I'll do my best to keep up with the growing context.
Sure, it wasn't disputed. @MateuszKubuszok so, to sum up your position: you want to promote auto-derivation because it works faster than providing custom intermediate instances of case class FooInner(a: Int, b: String)
case class FooOuter(inner: FooInner, qux: QuxInner)
case class BarInner(a: Int, b: String)
case class BarOuter(inner: BarInner, qux: QuxOuter)
case class QuxInner(qux: Int)
case class QuxOuter(qux: Int)
case class QuuxInner(qux: QuxInner)
case class QuuxOuter(qux: QuxOuter)
import io.scalaland.chimney.Transformer
import io.scalaland.chimney.dsl.*
import scala.compiletime.codeOf
transparent inline given TransformerConfiguration[?] =
TransformerConfiguration.default.enableMacrosLogging
locally {
FooOuter(FooInner(10, "bb"), QuxInner(42)).transformInto[BarOuter]
QuuxInner(QuxInner(42)).transformInto[QuuxOuter]
} Does the fact that
|
I didn't introduce any new way of resolving implicits as much as I sit down to understand what compiler actually does, and what are the consequences of the design choices of the other libraries, and made a different choices. Implicits search start by looking at definitions which are available in the current scope: via Quite a lot of the (Scala 2) libraries are based on Shapeless. As I wrote elsewhere it:
Of course for that to work, these implicits have to be available when we need them. Autoderivation just imports them all into the scope, either by implicit that you do with This might be a problem because how would you then do something like: So there was another invention when you needed a separate type class, trait TypeClass[A]
trait DerivedTypeClass[A]
implicit def generic[A, AGen](implicit gen: Generic.Aux[A, AGen], hlist: DerivedTypeClass[AGen]): DerivedTypeClass[A] = ...
implicit def hcons[Head, Tail](implicit head: TypeClass[A], tail: DerivedTypeClass[A]):
DerivedTypeClass[A] = ...
implicit val hnil: DerivedTypeClass[HNil] ...
def deriveSemi[A](implicit DerivedTypeClass[A]): TypeClass[A] = convertDerivedToNormal Automatic derivation needed sth like 2n+1 implicits for a flat case class/sealed, it if it was If you have semiautomatic derivation... it does not solve that issue. It solves the issue that compiler does not have to cache the intermediate implicits (it might try to, but it is not guaranteed), so it might be easily be exponential time. Semiautomatic derivation for intermediate types is basically user caching intermediate result so that the compiler would spend CPU on lookup, rather than spending it on generating the code it already generated a moment ago. It generates a lot of overhead. Magnolia tries to address it by having 1 macro expansion for every level, so that even if you are not caching the instances, at least you are only looking for And it is still not what Chimney or Jsoniter do. From the POV of approach described above Chimney/Jsoniter with recursion enabled have neither true autoderivaion nor true semiautomatic derivation. Chimney separate But this is where things are different. OOTB, there is no implicit So it's basically recursive semiautomatic derivation. Automatic derivation is achieved through a fallback: if there is no Jsoniter has a similar approach, but it removed automatic derivation completely, so it is only semiautormatic derivation, with optional recursion, which would pick up available implicits if available, or generaten inlined expressions if not. TL;DR It isn't reimplementation of how implicits work, more of a divorce from the convention how they were used by virtually every library which was based on Shapeless and drew its design from Circe (because Circe is responsible for promoting these easy but suboptimal patterns). And it was really tricky to come up with something that works this way but is easy to use 90% of the time.
If you:
then semiautomatic derivation with
so the macro looses the ability to:
If we don't have to reuse transformation, we might just use automatic derivation or do into.transform which inlines the transformation (but it has the builder DSL overhead for creating a Vector for possible overrides). I'd say that if one wanted to really increase performance (and compilation times) as much as possible then something like: object ModuleForDerivation {
// No implicits, each macro derives only the transformation only for the outer
// types, which would be actually transformed, not for inner types whose transformation
// would be inlined.
val transformer1 = Transformer.derive
val transformer2 = Transformer.derive
val transformer3 = Transformer.derive
}
object ModuleForImporting {
// exposing the results of the derivation as implicits
implicit val transformer1 = ModuleForDerivation.transformer1
implicit val transformer2 = ModuleForDerivation.transformer2
implicit val transformer3 = ModuleForDerivation.transformer3
} combined with import ModuleForImporting._ // transformers
import io.scalaland.chimney.syntax._
// only summons Transformers, see no Transformer.AutoDerived
// so every transformation has to be manually derived and stored somewhere would work best - it would make sure that each Of course one has to use benchmarks to be sure - at some point inlining is detrimental to the JVM's performance rather than helping it, so there would be cases when one should avoid inlining. |
To keep things brief and statement-like, even with Chimney's approach of building an accumulated Anyway, even with your detailed explanation, I still missed the reasoning behind why manually written instances are worse (in terms of compilation time) than using |
Hey, sorry for the late reply, I didn't forget about your question. I just had to attend to other things the last several days.
I cannot say anything for every case, I started doing benchmarks to be able to answer such questions and as always it's: it depends. I will show my findings on some conference soon.
Basically, this jsoniter-like approach... well, it doesn't necessarily play along with patterns that works in Circe or other "standard auto vs semiauto" libraries:
Long story short, |
That way of summoning is commonly used in the projects I used to work on. In conjunction with #591, it might give a great experience of writing Transformers: