Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to clean up unnecessary Serialization/Deserialization to/from Matter (and ducktyped) subclasses (Indexer, Counter) #753

Open
SmithSamuelM opened this issue Apr 12, 2024 · 0 comments

Comments

@SmithSamuelM
Copy link
Collaborator

SmithSamuelM commented Apr 12, 2024

Matter subclasses support round trippable transformation between the three CESR domains: Text (qb64), Binary (qb2) and Raw (code, raw) tuple. The Matter primitive holds the value as a (code, raw) tuple and then transforms to Text to Binary as needed. The raw value is needed to performa cryptographic operations on the primitive or in other cases to use the primitive as some other type like a number (int) or string or the like.

In many cases a Matter subclass is used for spurious non-transform where it is transformed from Text to Primitive Instance (code,raw) and back to Text without ever needed to use it in (code, raw) form. This is unnecessary. Just keep in in Text form.

There are three reasons to use a Matter instance and convert from Text or Binary to (code, raw) form. (Note this is in addition to the major use case for the Matter instance which is to generate the Text or Binary from (code, raw) form in the first place.)

  1. parsing a concatenated (composed) stream of primitives. Each primitive is self-framing and the Primitive instance knows how to extract the Text or Binary from the stream which end up with it in (code, raw) form. So even if the only use is to re-serialize, the instance was required to deserialize in a framed way. If the Text or Binary value is not concatenated then there is no reason to convert to the Instance (code, raw) if its only ever going to be used in the Text or Binary form.
  2. When an operation is to be performed on the raw value after deserialization from Text Binary
  3. When the Text or Binary value is to be stored in database in concatenated form and the database doesn't know what Instance type to use to deconcatenate it later. In other words the database stored value is not sniffible so it must be preconfigured with the list of instance types so it can deconcatenate later.

We can largely remove 3. above by changing the interface to CesrSuber and CatCesrSuber. What has happened is that the end state of storing has forced creating instances everywhere instead of simply at the interface to the database. The backpressure of the database interface is driving the interface of everything upstream which is an antipattern. We need to fix the database interface to be smarter and/or better decouple it from its upstream.

Looking at this a different way. The convenience of the fully qualified Text domain representation (qb64) is that we have a human readable ascii (Base64) representation of a primitive that we can use as an identifier. When using as an identifier we don't need the Matter instance that generated it. We just need the string. In many cases we are better off just passing around the string between functions and methods and then only instancing as Matter at the point of need for a cryptographic operation instead of passing around the instance and then deserializing over and over everywhere when its used as an identifier. Especially when the later is the predominant use case and the former is comparatively rare.

Matter instances (and their duck types Indexer, counter) don't store the qb64 and qb2 representations. The generate them as properties when referenced. This avoids always doing what may be an unnecessary extra serialization. Instead when creating an Instance, the instance may be created from (code, raw) or may be created from some other input that is used to compute (code, raw), or may be by parsing a stream in qb64/qb2 which determines how many bytes to pull from the stream (self framing). The instance does not distinquish between a stream and framed input for qb2 or qb64 so it always parses. When it parses it pulls off parts of the serialization one at a time in order to compute the (code, raw). It actually doesn't have the full qb2 or qb64 onces its done. It would have to then reserialize the (code, raw). If it doesn't parse it can't ensure that it computes the (code, raw) correctly. Consequently, it must first compute (code, raw) and then now that it has (code, raw) the corresponding qb2 or qb64 must be re-computed from the (code, raw). Recall that it may have gotten a qb2 stream or a qb64 stream not both. So storing all three means that in most cases there is at least one or two spurious reserializations required to store.

This means that in general, if the use case is not to use (code, raw) but use the qb64 so extracted from a qb64 or qb2 or (code, raw) input then just pass around the qb64 and if one needs to convert to one of the other sometime later then and only then re-instance in order to do the conversion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant