Skip to content

Commit

Permalink
Update remote inputs documentation for 100k dataset
Browse files Browse the repository at this point in the history
  • Loading branch information
jameshadfield committed Jun 29, 2023
1 parent 2f9e380 commit 8a452a6
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions docs/src/reference/remote_inputs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,13 @@ Our GISAID and open profiles each define 7 builds (a Global build and one build
- ``{build_name}/{build_name}_tip-frequencies.json``
- ``{build_name}/{build_name}_root-sequence.json``

100k Subsamples
---------------

We also produce a subsample of the entire open dataset of around 100,000 samples.
This is particularly useful for development purposes or to run builds locally as the file sizes are typically around 10Mb (metadata) and 20Mb (sequences).
The data is chosen by sampling 50,000 samples from the previous 12 months and 50,000 prior to that, and within each sample we group by year, month and country in an attempt at even sampling.

--------------

.. _remote-inputs-open-files:
Expand All @@ -71,6 +78,10 @@ Each regional build (``global``, ``africa``, ``asia``, ``europe``, ``north-ameri
+-----------------------+-----------------------+------------------------------------------------------------------------------+
| | aligned (xz) | https://data.nextstrain.org/files/ncov/open/aligned.fasta.xz |
+-----------------------+-----------------------+------------------------------------------------------------------------------+
| 100k sample | metadata | https://data.nextstrain.org/files/ncov/open/100k/metadata.tsv.xz |
+-----------------------+-----------------------+------------------------------------------------------------------------------+
| | sequences | https://data.nextstrain.org/files/ncov/open/100k/sequences.fasta.xz |
+-----------------------+-----------------------+------------------------------------------------------------------------------+
| Global sample | metadata | https://data.nextstrain.org/files/ncov/open/global/metadata.tsv.xz |
+-----------------------+-----------------------+------------------------------------------------------------------------------+
| | sequences | https://data.nextstrain.org/files/ncov/open/global/sequences.fasta.xz |
Expand Down

0 comments on commit 8a452a6

Please sign in to comment.