Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add collection search extension #136

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

hrodmn
Copy link

@hrodmn hrodmn commented Jul 26, 2024

Related Issue(s):

Description:
Adds the collection search extension by pointing the all_collections method at the pgstac function collection_search instead of all_collections.

I have not tested to see if paging is implemented yet, need to upload a bunch of collections to my local db I guess! 😅

PR Checklist:

  • pre-commit hooks pass locally
  • Tests pass (run make test)
  • Documentation has been updated to reflect changes, if applicable, and docs build successfully (run make docs)
  • Changes are added to the CHANGELOG.


# Do the request
try:
search_request = self.post_request_model(**clean)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 😬 the issue is that here we don't have to the model used, we should add a collection_get_request_model attribute to the client or find a way to avoid this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the problem correctly, we are just trying to dump the args into a json that pgstac will accept, but we are using a model for item search that is defined in stac_pydantic. Does this mean we want to add a new attribute to the CoreCrudClient that is equivalent to CollectionSearchExtension().GET?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's kinda weird because we will pass the Client to the CollectionSearch extension, which kinda result in a circular dependency, but that's how it's done for the search post model https://github.com/stac-utils/stac-fastapi/blob/main/stac_fastapi/types/stac_fastapi/types/core.py#L346

Does this mean we want to add a new attribute to the CoreCrudClient that is equivalent to CollectionSearchExtension().GET?

But yes I think we will have to do that!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried adding a collections_get_request_model attribute based on the BaseCollectionSearchGetRequest class but it doesn't work because that class is not a pydantic model. It looks like the post_request_model is a BaseSearchPostRequest which inherits from stac_pydantic.api.Search (which is based on a BaseModel), but BaseSearchGetRequest is based on the APIRequest class which does not inherit from a pydantic model and thus has no model_dump_json() method.

Here is what I tried:

@attr.s
class CoreCrudClient(AsyncBaseCoreClient):
    """Client for core endpoints defined by stac."""

    collections_get_request_model: BaseCollectionSearchGetRequest = attr.ib(
        default=BaseCollectionSearchGetRequest
    )

    async def all_collections(  # noqa: C901
        self,
        request: Request,
        # Extensions
        bbox: Optional[BBox] = None,
        datetime: Optional[DateTimeType] = None,
        limit: Optional[int] = None,
        query: Optional[str] = None,
        token: Optional[str] = None,
        fields: Optional[List[str]] = None,
        sortby: Optional[str] = None,
        filter: Optional[str] = None,
        filter_lang: Optional[str] = None,
        **kwargs,
    ) -> Collections:
        """Cross catalog search (GET).

        Called with `GET /collections`.

        Returns:
            Collections which match the search criteria, returns all
            collections by default.
        """

        # Parse request parameters
        base_args = {
            "bbox": bbox,
            "limit": limit,
            "token": token,
            "query": orjson.loads(unquote_plus(query)) if query else query,
        }

        clean = clean_search_args(
            base_args=base_args,
            datetime=datetime,
            fields=fields,
            sortby=sortby,
            filter=filter,
            filter_lang=filter_lang,
        )

        # Do the request
        try:
            search_request = self.collections_get_request_model(**clean)
        except ValidationError as e:
            raise HTTPException(
                status_code=400, detail=f"Invalid parameters provided {e}"
            ) from e

        return await self._collection_search_base(search_request, request=request)

So I think I will need to get the json for the pgstac query using a different approach.

Copy link
Author

@hrodmn hrodmn Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this working by using BaseCollectionSearchPostRequest (instead of BaseCollectionSearchGetRequest) which follows the pattern used in the item search. Does that approach make sense too you @vincentsarago?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I see the path. I probably need to use create_request_model to make sure the appropriate args get added for the extensions. However, without creating a new pydantic model, I would still need to use the POST model because the search functionality relies on the pydantic model. That's probably something that belongs in stac-fastapi, right? Is there a particular reason the GET request models are a generic class rather than a pydantic model in stac-fastapi?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason the GET request models are a generic class rather than a pydantic model in stac-fastapi?

yes because pydantic model are interpreted as body parameter for endpoints

Copy link
Author

@hrodmn hrodmn Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PgstacSearch model is compatible with collection search, and it follows the pattern that we use in the get_search method which also uses self.post_request_model.

try:
search_request = self.post_request_model(**clean)
except ValidationError as e:
raise HTTPException(
status_code=400, detail=f"Invalid parameters provided {e}"
) from e

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience while I figure this out. I think I need to double back and make sure I understand how all of these models are getting used. Upon closer examination, it looks like stac-fastapi-pgstac is just using the AsyncBaseCoreClient.post_request_model for all of the POST requests:
e.g.

search_request = self.post_request_model(**clean)

We provide a search_post_request_model to StacAPI:

post_request_model = create_post_request_model(extensions, base_model=PgstacSearch)
get_request_model = create_get_request_model(extensions)
api = StacApi(
settings=settings,
extensions=extensions,
client=CoreCrudClient(post_request_model=post_request_model), # type: ignore
response_class=ORJSONResponse,
items_get_request_model=items_get_request_model,
search_get_request_model=get_request_model,
search_post_request_model=post_request_model,
)

But I can't see how search_post_request_model is getting used by anything right now.

Copy link
Author

@hrodmn hrodmn Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess search_get_request_model and search_post_request_model are a dead end. We pass the post_request_model that is based on PgstacSearch to the CoreCrudClient and that is what gets used to generate inputs for all of the pgstac search functions.

I am pretty sure we can use it for the /collections GET method like we do in the /collections/{collection_id}/items GET method, that is what my last commit implements.

@m-mohr
Copy link
Contributor

m-mohr commented Aug 1, 2024

Great work. A queryables link seems to be missing in GET /collections, right?

@vincentsarago
Copy link
Member

Great work. A queryables link seems to be missing in GET /collections, right?

@m-mohr correct, we will get the changes we just pushed to main from #89

@hrodmn hrodmn force-pushed the collection-search branch 2 times, most recently from dab6638 to 4ab53fa Compare August 2, 2024 12:05
@hrodmn hrodmn marked this pull request as ready for review August 2, 2024 12:09
@hrodmn
Copy link
Author

hrodmn commented Aug 2, 2024

Great work. A queryables link seems to be missing in GET /collections, right?

@m-mohr correct, we will get the changes we just pushed to main from #89

I rebased on main so this branch should be up-to-date with the queryables link now.

@hrodmn
Copy link
Author

hrodmn commented Aug 19, 2024

@vincentsarago are you worried about the possibility of inconsistencies between the arguments in PgstacSearch and the collection-search GET request model? I believe the collection-search GET request should have a subset of the parameters from item-level search and never any extra parameters. If I pass a collections_get_request_model that should limit the args that get passed to PgstacSearch in the all_collections method to the ones that are available in the collection-search endpoint with all of the enabled extensions.

I'm sorry about all of the force pushes, I had to clean some things up in order to make the changes mergeable with main 😅

stac_fastapi/pgstac/app.py Outdated Show resolved Hide resolved
Comment on lines +45 to +47
bbox: Optional[BBox] = None,
datetime: Optional[DateTimeType] = None,
limit: Optional[int] = None,
Copy link
Contributor

@alukach alukach Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts from discussion: consider adding id to the endpoint, despite it being omitted from the Collection Search Extension.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that the same as just calling /collections/:id ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use-case for including an ids parameter would be to limit the scope of a search in the context of scoped authentication for a STAC API, but we discussed some more and it probably makes more sense to use the filter extension for injecting scope limits in a search request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ids would be a better parameter name than id to evoke the ability to provide multiple as a filter. And for our specific needs today, we will make use of filter as @hrodmn said. I believe it was @bitner who suggested adding the ability to filter by ID parameter, I'll let him weigh in on whether we should go without.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense just to be parallel to the items spec (and yes, "ids" plural which is how it works in items)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add it, I'd recommend to add a separate conformance class for it so that clients actually know whether it's supported or not.

PS: ids is not included in collection search as we just inherit from OGC API - Records, which doesn't have it. It's orthogonal to how ids is not part of OGC API - Features for items. ids is a STAC-specific thing.

@hrodmn
Copy link
Author

hrodmn commented Sep 12, 2024

@vincentsarago I think we need a new release of stac-fastapi with the changes from stac-utils/stac-fastapi#745 in order to properly configure the collection-search extension with the other configured extensions.

@vincentsarago
Copy link
Member

@hrodmn yes, I waited to see if we could also have stac-utils/stac-fastapi#744 merged but I haven't had a lot of time recently to work on this.
I'll target next week to make a new release 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants