You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[RMP] Support Offline Batch processing of Recs Generation Pipelines #419
I would like to be able to convert data between frameworks without having to worry about how to transfer from one underlying framework to another. It should be almost seamless, and chainable. I would like to be able to turn data from more framework to another and then possibly to another. I would like to do this in the most efficient (speed/memory) way possible.
Goal:
Create a class that developers can leverage to easily transfer data between frameworks.
Users should be able to concatenate columns of data and transform to a target framework.
User should not have to know anything about the underlying framework and how the data needs to be moved from one framework to another (i.e. dlpack, cuda array interface, numpy array interface).
User should be able to transform data from current framework to target framework in one function call.
Transforms should be zero copy or minimal copy when possible.
Constraints:
Must support dynamic environments, Not all libraries are guaranteed to be present at all times.
Available libraries should be automatically available
Unavailable library interfaces should inform the user the target packages are not installed on use.
Must be easily extensible to leverage new data types as they become supported.
Must be useable in merlin systems(operator input transformations) and merlin models (replace FeatureCollection).
Starting Point:
- Create a base class to house data at the column level
- Create subclasses of the base class to support each framework
- Classes should self register, be available if the target framework is imported successfully.
- Create a class to support multiple columns of different subclasses (i.e. concat prediction from multiple models, xgboost, tensorflow, pytorch)
It is critical that this ticket not only be created, but also kept up to date. As you work constraints are going to be discovered and should be added to the above list. Tasks required to complete this project may change. The goal of the work may even change. Without a commitment to keeping this ticket up to date the work shouldn't be undertaken.
The text was updated successfully, but these errors were encountered:
This issue, while containing a great description of a piece of technical work that's relevant to customer issues, doesn't itself reflect a specific customer issue—and therefore should probably not be a roadmap ticket. How we build cross-framework evaluation or batch processing (as described by the issues linked in the problem statement section) is an implementation detail below the level of description the roadmap issue template requests.
I agree it shouldn't be a roadmap ticket, it's a subset of a few other roadmap tickets.
I still think the format of problem, goal, constraint and starting point is valuable outside the scope of roadmap tickets in order to ensure that we understand what it is we're solving for.
Problem:
As a developer. to satisfy the requirements for both:
I would like to be able to convert data between frameworks without having to worry about how to transfer from one underlying framework to another. It should be almost seamless, and chainable. I would like to be able to turn data from more framework to another and then possibly to another. I would like to do this in the most efficient (speed/memory) way possible.
Goal:
Constraints:
Starting Point:
It is critical that this ticket not only be created, but also kept up to date. As you work constraints are going to be discovered and should be added to the above list. Tasks required to complete this project may change. The goal of the work may even change. Without a commitment to keeping this ticket up to date the work shouldn't be undertaken.
The text was updated successfully, but these errors were encountered: