Skip to content
Nathan Watson edited this page Jan 12, 2018 · 27 revisions

Summary

Registering meta-data encompases both creating new objects and updating existing objects, such as libraries, biosamples, experiments, for example. This repository contains two tools for registering object meta-data. The first is a more user-friendly route that entails filling out an Excel sheet. The second is client library that you programatically access. Lets call these methods Excel and Client, respectively.

Installation

See the Installation wiki page.

Method 1: Excel

It's simple - you fill out a tab-delimited file (I recommend using Excel) and then use the script register_meta.py. You don't use Excel directly with this script, instead you need to export the Excel sheet to a tab-delimited file, and use this as the script's input file. The tab-delimited file must have a header line in the first row, and the field names must match exactly to the corresponding names in the appropriate schema on the ENCODE Portal. For example, if you are registering new biosamples, then you can use field names that appear in the JSON profile/schema at https://www.encodeproject.org/profiles/biosample.json.

There are a few subtleties to keep in mind regarding the formatting of values when it comes to arrays or objects. The values of some fields in a given schema are simple strings or integers. But some fields take an array of values. Consider the 'aliases' field that is present in all schemas, which takes an array of strings as it's value, as indicated in the schemas. Let's say that you have two aliases for a given biosample, simply named "alias1" and "alias2". How can you enter array values in your tab-delimited file? You can use JSON array syntax, which looks like this:

["alias1", "alias2"]

But you are also permitted to enter just alias1, alias2, or even "alias1", "alias2". The choice is yours.

Things are a bit more complex if you have an JSON object as a value, or an array of objects. For example, take a look at the "introduced_tags" field in the genetic_modifications profile. This is an array field whose individual items are JSON objects. A JSON object is a collection of keys and values, comparable to a dictionary in Python or a hash in Ruby and Perl. Because objects are difficult to serialize into a simpler structure for the purpose of this script, you must enter objects in valid JSON format. Let's take a look at an example where you are registering a genetic_modification and you have two introduced_tags you want to specify. The value of the introduced_tags key is an array of objects, where each object has the following fields:

Field Value
name string
location enum: "C-terminal","internal", "N-terminal", "other", "unknown"

Let's say that you have two tags, one named "tag1" that is N-terminal, and the other named "tag2" that is C-terminal". You can enter this in your spreadsheet in the introduced_tags column as an array of JSON objects as follows:

[ {"name": "tag1", "location": "N-terminal"}, {"name": "tag2", "location": "C-terminal"} ]

You can drop the array literal - "[", and "]" bookends - if you prefer, but everything else here must be valid JSON.

Method 2: Client Library

Documentation pending.

Clone this wiki locally