Skip to content
Nathan Watson edited this page Jan 25, 2018 · 27 revisions

Summary

Registering meta-data with the ENCODE Portal encompases both creating new objects and updating existing objects, such as libraries, biosamples, experiments, and so forth. This repository contains a Python client API and a companion tool for registering object metadata, as well as some handy scripts. The companion tool is a more user-friendly route to submitting objects that basically entails filling out an Excel sheet.

Installation and Configuration

See installation details here. Once you have this installed, be sure to see the configuration page in order to properly set up a few expected environment variables.

Client API

Incorporate the API into your own submission clients. For example, you may have your own LIMS and need to write client software to package objects up and submit them to the ENCODE Portal. Your client software can then incorporate this API to handle the aspects of communication with the ENCODE Portal - you just need to take care of properly packaging up your objects to be submitted into appropriate JSON structures.

See official API documentation on Read The Docs.

Companion Registration Tool

It's simple - you fill out a tab-delimited file (I recommend using Excel) and then use the script register.py. You don't use Excel directly with this script, instead you need to export the Excel sheet to a tab-delimited file, and use this as the script's input file. The tab-delimited file must have a header line in the first row, and the field names must match exactly to the corresponding names in the appropriate schema on the ENCODE Portal. For example, if you are registering new biosamples, then you use field names that appear in the biosmaple JSON schema.

There are a few subtleties to keep in mind regarding the formatting of values when it comes to arrays or objects. The values of some fields in a given schema are simple strings or integers. But some fields take an array of values. Consider the 'aliases' field that is present in all schemas, which takes an array of strings as it's value, as indicated in the schemas. Let's say that you have two aliases for a given biosample, simply named "alias1" and "alias2". How can you enter array values in your tab-delimited file? You can use JSON array syntax, which looks like this:

["alias1", "alias2"]

But you are also permitted to enter just alias1, alias2, or even "alias1", "alias2". The choice is yours.

Things are a bit more complex if you have an JSON object as a value, or an array of objects. For example, take a look at the "introduced_tags" field in the genetic_modifications profile. This is an array field whose individual items are JSON objects. A JSON object is a collection of keys and values, comparable to a dictionary in Python or a hash in Ruby and Perl. Because objects are difficult to serialize into a simpler structure for the purpose of this script, you must enter objects in valid JSON format. Let's take a look at an example where you are registering a genetic_modification and you have two introduced_tags you want to specify. The value of the introduced_tags key is an array of objects, where each object has the following fields:

Field Value
name string
location enum: "C-terminal","internal", "N-terminal", "other", "unknown"

Let's say that you have two tags, one named "tag1" that is N-terminal, and the other named "tag2" that is C-terminal". You can enter this in your spreadsheet in the introduced_tags column as an array of JSON objects as follows:

[ {"name": "tag1", "location": "N-terminal"}, {"name": "tag2", "location": "C-terminal"} ]

You can drop the array literal "[", and "]" bookends if you prefer, but everything else here must be valid JSON.

Clone this wiki locally