Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set-archive-status and delete-data subcommand do not work on OpenSearch serverless Registry #78

Closed
tloubrieu-jpl opened this issue Jul 2, 2024 · 17 comments · Fixed by #81
Assignees
Labels
B15.0 bug Something isn't working s.critical

Comments

@tloubrieu-jpl
Copy link
Member

Checked for duplicates

No - I haven't checked

🐛 Describe the bug

The following commands fail with 404 error. The lidvid requested is not found when using an OpenSearch serverless Registry.

./bin/registry-manager set-archive-status \
    -auth /Users/loubrieu/Documents/pds/registry/es-auth.txt \
    -es file:/Users/loubrieu/Documents/pds/registry/mcp_dev.xml \
    -lidvid urn:nasa:pds:insight_rad:data_derived::7.0 \
    -status archived
./bin/registry-manager delete-data \
    -auth /Users/loubrieu/Documents/pds/registry/es-auth.txt \
    -es file:/Users/loubrieu/Documents/pds/registry/mcp_dev.xml \
    -lidvid urn:nasa:pds:insight_rad:data_raw:hp3_rad_raw_00478_20200401_121608::1.0

🕵️ Expected behavior

I expected the commands to succceed.

📜 To Reproduce

Follow temporary manual https://docs.google.com/document/d/19FVATXpfbKxmoGjcq9Zxpuml4pX-nBZRlNzleGLttPA/edit

🖥 Environment Info

No response

📚 Version of Software Used

latest SNAPSHOT versions:
registry-common 1.6.0-SNAPSHOT
registry-mgr: 4.7.0-SNAPSHOT

as merged on main.

🩺 Test Data / Additional context

No response

🦄 Related requirements

🦄 #xyz

⚙️ Engineering Details

It sounds like the error comes from the special character encoding management in the AWS SDK used to query the OpenSearch database.

🎉 Integration & Test

No response

@al-niessner
Copy link
Contributor

@tloubrieu-jpl

Both of these commands require the -index as well. Even with that set correc.tly, they fail for me on AOSS and pass on a local opensearch 2.13.0

@al-niessner
Copy link
Contributor

@tloubrieu-jpl

Since both of these use the lidvid, probably going to fail until opensearch-project/opensearch-java#1068 is resolved.

@al-niessner
Copy link
Contributor

Move to search instead of get because of the ':'s in the opensearch document ID.

@al-niessner
Copy link
Contributor

al-niessner commented Jul 24, 2024

Fixed by fix in opensearch. See NASA-PDS/registry-common#54

@tloubrieu-jpl
Copy link
Member Author

Hi @al-niessner ,

The set-archive-status command work but the delete-data fails.

My mvn dependency confirms I am using the version 3.0.03-SNAPSHOT of openseach-java:

[INFO] | +- org.opensearch.client:opensearch-java:jar:3.0.0-SNAPSHOT:compile

The command I am using on the MCP dev opensearch is, with the logs :

% ./bin/registry-manager delete-data \                  
    -auth /Users/loubrieu/Documents/pds/registry/es-auth.txt \
    -es file:/Users/loubrieu/Documents/pds/registry/mcp_dev.xml \
    -lidvid urn:nasa:pds:insight_rad:data_raw:hp3_rad_raw_00478_20200401_121608::1.0
Elasticsearch URL: file:/Users/loubrieu/Documents/pds/registry/mcp_dev.xml
null

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
[ERROR] Request failed: [http_exception] server returned 404

Did you test that yourself ?

Thanks

@al-niessner
Copy link
Contributor

@tloubrieu-jpl yes, it is failing to do delete. Back into opensearch internals.

@al-niessner
Copy link
Contributor

al-niessner commented Jul 25, 2024

@tloubrieu-jpl @jordanpadams

Just heard back from opensearch (see ticket referenced above) and they say AOSS does not support DeleteByQuery. Can we ask AWS to verify please?

Sure enough, it does not show up in supported operations:
https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-genref.html#serverless-operations

Here is a post suggesting to use delete collection:
https://repost.aws/questions/QUzT6LZt1qRGKNyPJQRV20PQ/does-opensearch-serverless-support-delete-by-query

@tloubrieu-jpl
Copy link
Member Author

tloubrieu-jpl commented Jul 25, 2024

Thanks @al-niessner ,

It does not sound like delete a collection is the right approach for us since the collection in our case is all the indices from all the nodes with all their data. We would need a concept of cluster of documents around one _id... It does not sound like that is what collection is.
As a work around we could use the DeleteRequest, like that:

DeleteRequest deleteRequest = new DeleteRequest.Builder().index(this.index).id(doc_a.get("lidvid")).build(); DeleteResponse dr = client.delete(deleteRequest);

But this would be more work/code on our side.

We can ask AWS where they are with the DeleteByQuery support. @sjoshi-jpl could you create a ticket for that ?

@al-niessner
Copy link
Contributor

@tloubrieu-jpl

If all you want to do is delete via ID, then, yes, DeleteRequest should work. However this is a change of scope albeit to the limit that we really use it. In other words, I do not remember and delete by query other than lidvid. Since this the same the id, they are equivalent. Maybe had a problem back in the day when originally written with ':' in URL. Anyway, if you want to change the use-case from delete key == value to delete ID then DeleteRequest is the logical solution.

If you want to keep the idea of delete key == value, then can do a search then delete results via their returned IDs.

@sjoshi-jpl
Copy link

@tloubrieu-jpl @al-niessner do we still need a ticket with AWS or are we proceeding with DeleteRequest ?

@tloubrieu-jpl
Copy link
Member Author

@sjoshi-jpl , we don't need the ticket to AWS thanks.

@jordanpadams from what I understand we can delete documents one by one, knowing their ids. But we can do delete from a search query, as we used to do.

We will discuss our options during the breakout today.

@tloubrieu-jpl
Copy link
Member Author

As discussed in the breakout, we want to delete the documents by apply search query, then deleting matching documents one by one.

The requirements are described in the users manual https://nasa-pds.github.io/registry/user/delete_data.html

Need details on:

  • lid : should we delete all matching lidvids
  • all : should we give up this option, we believe so because it is not a node user option anymore
  • we should clean also the registry-refs so that the database stays consistent: e.g. remove a product from registry-refs documents or remove documents associated to a collection lidvid.
  • beware that a deleted product might still be referenced in other products (ref_lid...), but that is ok.

@tloubrieu-jpl
Copy link
Member Author

@jordanpadams can you review the comment above ?

@tloubrieu-jpl
Copy link
Member Author

We give up -lid options for delete-data.

We will create a new ticket to delete all for the current node.

@jordanpadams
Copy link
Member

@tloubrieu-jpl Do we have this ticket for delete-all? I think that is common use case for end users during testing/dev?

@jordanpadams
Copy link
Member

nvm see #80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B15.0 bug Something isn't working s.critical
Projects
Status: 🏁 Done
Development

Successfully merging a pull request may close this issue.

4 participants