Skip to content

Foldseek 3-915ef7d

Compare
Choose a tag to compare
@milot-mirdita milot-mirdita released this 01 Aug 13:12
· 819 commits to master since this release

Features

You can choose between Alphafold/UniProt, Alphafold/UniProt-NO-CA and Alphafold/UniProt50:
Alphafold/UniProt: Contains all 214 million entries from the AlphaFold UniProt database, including C-alpha. This database is ~700GB large to download and ~950GB after extraction.
Alphafold/UniProt-NO-CA: Excludes C-alphas and is much smaller (~70GB download, ~170GB extracted). However, TM-align based alignments do not work (search --alignment-type 1, tmalign, and convertalis --format-output alntmscore,u,t).
Alphafold/UniProt50: Alphafold/UniProt clustered with MMseqs2 to 50% sequence identity and 80% bidirectional coverage (~190GB download). We offer this database in the web server at https://search.foldseek.com.

  • Added databases TSV output
  • createdb supports downloading structures from Google Cloud Storage. Not enabled by default, see user guide on how to compile Foldseek with GCS support
  • PDB offered through databases will be updated regularly. Thanks to @jaylee2000

Known issues

  • prefilter against large databases such as the AlphaFold Uniprot Protein Structure Database is executed with 6-mers (-k 6). This is less efficient than 7-mers. We will optimize 7-mer parameters in a future release and re-enable automatic k-mer size choice

Bug fixes

  • Fixed PDB download