Skip to content

Releases: StanfordBioinformatics/encode_utils

2.11.0

20 Jan 18:02
Compare
Choose a tag to compare

What's Changed

  • EU-7 handle DCC urls with or without https schemes by @paul-sud in #25
  • EU-23 Fix bugs affecting prop removal by @jenjou in #38
  • EU-24-gc-upload by @jenjou in #39
  • EU-25-update-remove-duplicate-associations-call by @yunhailuo in #40
  • EU-27-eu-register-file-upload by @paul-sud in #43
  • Pass 'dry_run' cli argument value on Connection init when 'dcc_mode' … by @emi80 in #45
  • EU-28-add-more-attachment-props by @jenjou in #46

New Contributors

  • @emi80 made their first contribution in #45

Full Changelog: 2.10.0...2.11.0

2.10.0

06 Aug 21:50
Compare
Choose a tag to compare

Updates

  • Default requests timeout has been increased to 60s
  • autoSql attachments are now supported

Novelties

  • Added an option to disable file logging
  • Added an option to truncate long request payloads

Bugfixes

  • Fixed incorrect singularization of publication_data to match portal

2.9.0

12 May 23:45
Compare
Choose a tag to compare

Updates

  1. Posted aliases are now deduplicated
  2. profiles.Profile has been removed and the rest of the profiles module has been refactored to allow for connection.Connection to pull schemas from URLs other than encodeproject.org

Novelties

  1. eu_register.py now exposes connection.Connection.remove_and_patch() to allow for removing and patching properties in one request using the --rm-patch flag
  2. File upload is now optional when posting file objects with connection.Connetion.post when specifying upload_file=False.
  3. connection.Connection.post can now optionally return the original response code of the POST in addition to the usual JSON by specifying return_original_status_code=True.

Bugfixes

  1. Fixed a bug in connection.Connection.extend_array_values that would cause spurious failures when extending with empty arrays
  2. Fixed a typo in the documentation

2.8.0

29 Oct 00:41
Compare
Choose a tag to compare

Updates

  1. Updated connection.Connection.patch() such that when the extend_array_values parameter is set to True, arrays with string or dictionary elements are extended and duplicates are removed.

Novelties

  1. Added a new method connection.Connection.remove_and_patch() that allows for removing and patching properties in one request. Please see the documentation for more details.

Bugfixes

  1. Fixed a bug in eu_register.check_valid_json() that would allow attempted submission of JSON arrays of mixed types.

2.7.0

12 Apr 02:52
Compare
Choose a tag to compare

Updates:

  • Updated connection.Connection.before_post_file() such that it will calculate file_size now (in addition to md5sum), and that whenever the md5sum needs to be set, the file_size will also be set.

Novelties:

  • Added connection.Connection.get_biosample_type() to aid in searching for a BiosampleType with a given classification and term_id or term_name.
  • Added function utils.orient_jpg() which fixes misoriented images, rotating them as necessary. connection.Connection.set_attachment() now calls this if the input file is a JPEG or TIFF.
  • Added aws_storage.py that includes two classes: S3Upload that simplifies the process of uploading files to a bucket in a specific location with the specified acl, and S3Object which represents an object in a S3 bucket and is internally used for calculating the md5sum and file size when submitting S3 objects to the ENCODE Portal.
  • Added utils.url_join() which is now used for property joining URL paths, rather than incorrectly with os.path.join since that doesn't construct the right paths on Windows systems.
  • Added new function in profiles.py called remove_duplicate_associations(). This is called when patching a record so that EU can detect and remove duplicates in array values (when extending arrays).

Bug fixes

  • Added patch to profiles.Profile._set_profile_id() to include exceptional cases, such as antibody_lot records using @id values like '/antibodies/ENCAB719MQZ' instead of the expected '/antibody_lots/ENCAB719MQZ'.

2.6.0

12 Jan 02:47
Compare
Choose a tag to compare

Updates:

  1. Updated documentation in transfer_to_gcp.py module.
  2. Updated utils.calculate_md5sum() such that it works with an S3 URI in addition to a local file path.
  3. Updated utils.calculate_md5sum() so that it is more memory efficient with large files by breaking it up into chunks.

Novelties:

  1. Added new script eu_patch_property.py, which is useful for patching a given property with the same value across multiple records, potentially records of different profiles. For example, I once used this to patch the award property of many different object types, as I had originally used the wrong award when submitting the records.
  2. New module replicate.py that makes working with replicates a whole lot easier. For example, you can search for a replicate by providing the biosample accession and the library accession. You can also let the module suggest what biosample_replicate_number and technical_replicate_number for you to use when submitting a new replicate object. Please see the well-documented source code for more details on how this works.
  3. Added method connection.Connection.get_experiments_with_biosample().
  4. Added class method profiles.Profile.profiles_with_property(), which returns a list of all profiles containing the given property name.

2.5.0

02 Nov 08:33
Compare
Choose a tag to compare

Bug fixes in October:

  1. Updated typecast() function in eu_register.py so that it can handle JSON Schema numbers. Moreover, when an attribute is declared to be a number, this function will convert from string to either int or float based on the given string representation. Thanks for khine for reporting.
  2. Fixed bug reported by weiwei where in profile.py the Profile.required_properties() method always expected the given profile to contain a top-level 'required' key, which isn't always the case. For example, the biosample profile has it, whereas the file profile has it in the anyOf subschema. For this latter case, the method now returns the empty array as there isn't at present any attempt to figure out what is conditionally required. This would have affected attempts to remove a property from such profiles since the behavior is to first prevent the user from removing required properties by popping those out of the payload.
  3. Fixed bug reported by Jennifer Jou where the profile.Profile._set_profile_id() method didn't properly singularize the profile ID in all cases. Fixed this by using the inflection module's singularize function.
  4. Fixed bug where eu_register.py's typecast() function didn't check for booleans to typecaset to. That meant that the registration script didn't always handle boolean fields properly. Thanks again to jjou for reporting.

New in Master:

10/23/2018

  1. Added script eu_get_accessions.py.
    Given an input list of record aliases, retrieves the DCC accession for each.
  2. Added script eu_create_gcp_url_list.py.

Updates in Master:

  1. Documentation in transer_to_gcp.py has been updated.
  2. Renamed utils.clean_alias_name() to utils.clean_aliases. This function now takes a list of aliases and either removes or replaces non-permitted characters, such as "/" and "#". This function is called in the pre-submit hook before_submit_alias.

2.4.0

11 Oct 23:47
Compare
Choose a tag to compare

New:
Added script eu_generate_upload_creds.py.
Added support for creating a URL list file in transfer_to_gcpy.py and at connection.Connection.gcp_transfer_urllist. This file can be used in GCP to copy files from ENCODE AWS buckets to GCP.

Updates:
Added option in transfer_to_gcp.py when transferring files to GCP to allow overwrites.
Fixed bug in connection.Connection.gcp_transfer method so that it correctly finds the s3 object path.

2.3.1

09 Aug 18:53
Compare
Choose a tag to compare

Known Bugs in this release

  • The s3 to gcp transfer mechanism won't work since the s3 bucket name hard-coded to the test value of pulsar-encode-assets. This is fixed in master.

Updates in this release

  • Added missing reference to dependency jsonschema in the setup.py file. No need to upgrade to this version if you already have release 2.3.0 and jsonschema in your Python packages list. Thanks @yunhailuo for adding the dependency fix.

S3 to GCP file transfer support

08 Aug 21:35
Compare
Choose a tag to compare

New:

  1. Added support to copy ENCODE files in AWS S3 to GCP; see the RTD documentation.
  2. Added script eu_search_results_json.py, which accepts an ENCODE Portal search URL and saves the results to a JSON file.

Updates:

  1. eu_register.py will not accept JSON in addition to tsv, thanks to the contribution from @yunhailuo.

Bug fixes:

  1. See notes in 2.0.0-pre pre-release.