Merlin Data Conversion Support #420

jperez999 · 2022-06-28T01:57:25Z

Problem:

As a developer. to satisfy the requirements for both:

[RMP] Create a standard set of cross-framework evaluation metrics models#450
[RMP] Support Offline Batch processing of Recs Generation Pipelines #419
I would like to be able to convert data between frameworks without having to worry about how to transfer from one underlying framework to another. It should be almost seamless, and chainable. I would like to be able to turn data from more framework to another and then possibly to another. I would like to do this in the most efficient (speed/memory) way possible.

Goal:

Create a class that developers can leverage to easily transfer data between frameworks.
Users should be able to concatenate columns of data and transform to a target framework.
User should not have to know anything about the underlying framework and how the data needs to be moved from one framework to another (i.e. dlpack, cuda array interface, numpy array interface).
User should be able to transform data from current framework to target framework in one function call.
Transforms should be zero copy or minimal copy when possible.

Constraints:

Must support dynamic environments, Not all libraries are guaranteed to be present at all times.
Available libraries should be automatically available
Unavailable library interfaces should inform the user the target packages are not installed on use.
Must be easily extensible to leverage new data types as they become supported.
Must be useable in merlin systems(operator input transformations) and merlin models (replace FeatureCollection).

Starting Point:

- Create a base class to house data at the column level
- Create subclasses of the base class to support each framework
- Classes should self register, be available if the target framework is imported successfully.
- Create a class to support multiple columns of different subclasses (i.e. concat prediction from multiple models, xgboost, tensorflow, pytorch)

It is critical that this ticket not only be created, but also kept up to date. As you work constraints are going to be discovered and should be added to the above list. Tasks required to complete this project may change. The goal of the work may even change. Without a commitment to keeping this ticket up to date the work shouldn't be undertaken.

karlhigley · 2022-06-29T19:55:31Z

This issue, while containing a great description of a piece of technical work that's relevant to customer issues, doesn't itself reflect a specific customer issue—and therefore should probably not be a roadmap ticket. How we build cross-framework evaluation or batch processing (as described by the issues linked in the problem statement section) is an implementation detail below the level of description the roadmap issue template requests.

viswa-nvidia · 2022-07-01T17:22:05Z

Moving this to arbitration based on Karl's comment. Lets discuss this in the next grooming meeting.

EvenOldridge · 2022-07-04T22:18:56Z

I agree it shouldn't be a roadmap ticket, it's a subset of a few other roadmap tickets.

I still think the format of problem, goal, constraint and starting point is valuable outside the scope of roadmap tickets in order to ensure that we understand what it is we're solving for.

jperez999 added the roadmap label Jun 28, 2022

jperez999 assigned EvenOldridge Jun 28, 2022

EvenOldridge removed the roadmap label Jul 4, 2022

EvenOldridge changed the title ~~[RMP] Merlin Data Conversion Support~~ Merlin Data Conversion Support Jul 4, 2022

EvenOldridge modified the milestones: Merlin 22.12, Merlin 22.11 Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merlin Data Conversion Support #420

Merlin Data Conversion Support #420

jperez999 commented Jun 28, 2022 •

edited by viswa-nvidia

Loading

karlhigley commented Jun 29, 2022

viswa-nvidia commented Jul 1, 2022

EvenOldridge commented Jul 4, 2022

Merlin Data Conversion Support #420

Merlin Data Conversion Support #420

Comments

jperez999 commented Jun 28, 2022 • edited by viswa-nvidia Loading

Problem:

Goal:

Constraints:

Starting Point:

karlhigley commented Jun 29, 2022

viswa-nvidia commented Jul 1, 2022

EvenOldridge commented Jul 4, 2022

jperez999 commented Jun 28, 2022 •

edited by viswa-nvidia

Loading