RFC.6

Problem Statement

RP supports remote and local data staging. Remote data staging refers to transfer of data between resources, local data staging to copy, move and link operations on a local, shared file system. Data staging directives can use two types of source and target URLs: the usual ones which point to a specific schema, host, and path element (etc.) referencing a data item, and custom ones with the following schemas:

client:/// - refers to the application's pwd
resource:/// - refers to the RP sandbox
pilot:/// - refers to the pilot sandbox
unit:/// - refers to the task sandbox

With the increase of use cases with more tightly and dynamically coupled tasks, these schemas turn out to be inefficient: in order for one task to refer to data items of another task, the first task needs to stage data to a global sandbox, and the second task then stage data into its task sandbox. For large numbers of tasks this is inefficient, and requires global coordination of file names to avoid conflicts. Note also that URL normalization makes it impossible to use relative path elements in URLs.

Proposal

RP should introduce explicit references to task sandboxes. To simplify the naming scheme, we propose to use

sandbox://<entity_id>/

as a uniform URL schema. For example

sandbox://client/ - refers to the application pwd
sandbox://pilot.0000/ - refers to the pilot sandbox for pilot.0000
sandbox://ornl.summit/ - refers to the resource sandbox for ornl.summit
sandbox://unit.123456/ - refers to the task sandbox of unit.123456

Impact

The translation from old URLs to new ones is straight forward, backward compatibility can be maintained for a time.
additional user cases become possible and much simpler. For example, the Repex layer would not need two-step data staging anymore, greatly simplifying replica orchestration.
Sandboxes are currently assigned in the UMGR scheduler, as after UMGR scheduling, all sandbox paths are resolvable, and all data staging follows after that state. The same approach should work for the new schemas, when introducing an additional cache for task sandboxes (prototype exists).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC.6

Problem Statement

Proposal

Impact

Clone this wiki locally