Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merlin Data Conversion Support #420

Open
6 tasks
jperez999 opened this issue Jun 28, 2022 · 3 comments
Open
6 tasks

Merlin Data Conversion Support #420

jperez999 opened this issue Jun 28, 2022 · 3 comments
Assignees
Milestone

Comments

@jperez999
Copy link
Collaborator

jperez999 commented Jun 28, 2022

Problem:

As a developer. to satisfy the requirements for both:

Goal:

  • Create a class that developers can leverage to easily transfer data between frameworks.
  • Users should be able to concatenate columns of data and transform to a target framework.
  • User should not have to know anything about the underlying framework and how the data needs to be moved from one framework to another (i.e. dlpack, cuda array interface, numpy array interface).
  • User should be able to transform data from current framework to target framework in one function call.
  • Transforms should be zero copy or minimal copy when possible.

Constraints:

  • Must support dynamic environments, Not all libraries are guaranteed to be present at all times.
  • Available libraries should be automatically available
  • Unavailable library interfaces should inform the user the target packages are not installed on use.
  • Must be easily extensible to leverage new data types as they become supported.
  • Must be useable in merlin systems(operator input transformations) and merlin models (replace FeatureCollection).

Starting Point:

  • - Create a base class to house data at the column level
  • - Create subclasses of the base class to support each framework
  • - Classes should self register, be available if the target framework is imported successfully.
  • - Create a class to support multiple columns of different subclasses (i.e. concat prediction from multiple models, xgboost, tensorflow, pytorch)

It is critical that this ticket not only be created, but also kept up to date. As you work constraints are going to be discovered and should be added to the above list. Tasks required to complete this project may change. The goal of the work may even change. Without a commitment to keeping this ticket up to date the work shouldn't be undertaken.

@karlhigley
Copy link
Contributor

This issue, while containing a great description of a piece of technical work that's relevant to customer issues, doesn't itself reflect a specific customer issue—and therefore should probably not be a roadmap ticket. How we build cross-framework evaluation or batch processing (as described by the issues linked in the problem statement section) is an implementation detail below the level of description the roadmap issue template requests.

@viswa-nvidia
Copy link

Moving this to arbitration based on Karl's comment. Lets discuss this in the next grooming meeting.

@EvenOldridge
Copy link
Member

I agree it shouldn't be a roadmap ticket, it's a subset of a few other roadmap tickets.

I still think the format of problem, goal, constraint and starting point is valuable outside the scope of roadmap tickets in order to ensure that we understand what it is we're solving for.

@EvenOldridge EvenOldridge changed the title [RMP] Merlin Data Conversion Support Merlin Data Conversion Support Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants