Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example of custom StacIO for Azure Blob storage #1372

Merged
merged 4 commits into from
Aug 2, 2024

Conversation

bmcandr
Copy link
Contributor

@bmcandr bmcandr commented Jul 25, 2024

Description:
Adds an example of a custom StacIO for reading/writing STAC objects from/to Azure Blob storage. I organized the existing S3 example and new Blob storage example into tabs to avoid cluttering up the docs:

Screenshot 2024-07-25 at 15 24 55

I implemented this custom StacIO and thought it might be a nice to include an example for another cloud provider in the docs, but feel free to reject if this is unwanted.

PR Checklist:

  • pre-commit hooks pass locally
  • Tests pass (run scripts/test)
  • Documentation has been updated to reflect changes, if applicable
  • This PR maintains or improves overall codebase code coverage.
  • Changes are added to the CHANGELOG. See the docs for information about adding to the changelog.

@bmcandr
Copy link
Contributor Author

bmcandr commented Jul 25, 2024

Full example code snippet:

import os
from pathlib import PurePosixPath
from typing import Any, Tuple, Union
from urllib.parse import urlparse

from azure.storage.blob import BlobClient, ContentSettings
from pystac import Link
from pystac.stac_io import DefaultStacIO, StacIO

class BlobStacIO(DefaultStacIO):
   """A custom StacIO class for reading and writing STAC objects
   from/to Azure Blob storage.
   """

   def _parse_blob_url(self, url: str) -> Tuple[str, str]:
      path = PurePosixPath(urlparse(url).path)
      container = path.parts[1]
      blob = "/".join(path.parts[2:])
      return container, blob

   def _get_blob_client(self, container: str, blob: str) -> BlobClient:
      return BlobClient.from_connection_string(
            os.environ["AZURE_STORAGE_CONNECTION_STRING"],
            container_name=container,
            blob_name=blob,
      )

   def read_text(self, source: Union[str, Link], *args: Any, **kwargs: Any) -> str:
      if isinstance(source, Link):
            source = source.href
      if source.startswith("https"):
            container, blob = self._parse_blob_url(source)
            blob_client = self._get_blob_client(container, blob)
            obj = blob_client.download_blob().readall().decode()
            return obj
      else:
            return super().read_text(source, *args, **kwargs)

   def write_text(
      self, dest: Union[str, Link], txt: str, *args: Any, **kwargs: Any
   ) -> None:
      """Write STAC Objects to Blob storage. Note: overwrites by default."""
      if isinstance(dest, Link):
            dest = dest.href
      if dest.startswith("https"):
            container, blob = self._parse_blob_url(dest)
            blob_client = self._get_blob_client(container, blob)
            blob_client.upload_blob(
               txt,
               overwrite=True,
               content_settings=ContentSettings(content_type="application/json"),
            )
      else:
            super().write_text(dest, txt, *args, **kwargs)

StacIO.set_default(BlobStacIO)

Copy link

codecov bot commented Jul 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.07%. Comparing base (65ea9a9) to head (b1452cc).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1372   +/-   ##
=======================================
  Coverage   91.07%   91.07%           
=======================================
  Files          51       51           
  Lines        7070     7070           
  Branches     1012     1012           
=======================================
  Hits         6439     6439           
  Misses        451      451           
  Partials      180      180           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@gadomski gadomski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution — I love the use of tabs, very clean. Couple code-level nits but in general makes sense to me.

docs/concepts.rst Outdated Show resolved Hide resolved
docs/concepts.rst Outdated Show resolved Hide resolved
docs/concepts.rst Outdated Show resolved Hide resolved
@bmcandr
Copy link
Contributor Author

bmcandr commented Jul 30, 2024

Here's the snippet I used for testing that reads an Item I copied from Planetary Computer into our Blob storage:

import pystac
from pystac.stac_io import StacIO
from typing import cast
from azure.core.credentials import AzureSasCredential

from io_utils.stac.blob_stacio import BlobStacIO

BlobStacIO.account_url = "https://impactobservatory.blob.core.windows.net"
BlobStacIO.credential = AzureSasCredential(
    "sp=r&st=2024-07-29T23:22:42Z&se=2024-09-30T07:22:42Z&spr=https&sv=2022-11-02&sr=b&sig=vSnlRevk6ZPl0GjyV%2Fe9KoJ8jh5B60gU%2B%2BwYMRMkQsE%3D"
)
BlobStacIO.overwrite = False

StacIO.set_default(BlobStacIO)


item_href = "https://impactobservatory.blob.core.windows.net/io-brendan-dev/S2B_MSIL2A_20240729T144729_R139_T20QRF_20240729T170459.json"
item: pystac.Item = cast(pystac.Item, pystac.read_file(item_href))

print(item.assets)

(the SAS token is scoped, temporary, and read-only)

@gadomski
Copy link
Member

gadomski commented Aug 2, 2024

Thanks for the changes, LGTM! Appreciate it!

@gadomski gadomski added this pull request to the merge queue Aug 2, 2024
Merged via the queue into stac-utils:main with commit 522c1b9 Aug 2, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants