Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

let big delete write s3 file before workspace commit #18804

Merged
merged 15 commits into from
Sep 16, 2024

Conversation

Wenbin1002
Copy link
Contributor

@Wenbin1002 Wenbin1002 commented Sep 14, 2024

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #18773

What this PR does / why we need it:

let big delete write s3 file before workspace commit


PR Type

Enhancement, Tests


Description

  • Enhanced transaction handling by adding logic to manage delete operations, including a new function dumpDeleteBatchLocked to handle delete batches.
  • Introduced a new field approximateInMemDeleteCnt in transactions to track the count of delete entries.
  • Updated the transaction write handling to support S3 operations, ensuring that delete operations are processed before workspace commits.
  • Added a new test Test_BigDeleteWriteS3 to verify the correct handling of delete operations and their interaction with S3 writes.

Changes walkthrough 📝

Relevant files
Enhancement
txn.go
Enhance transaction handling with delete batch management

pkg/vm/engine/disttae/txn.go

  • Added logic to handle delete operations in transactions.
  • Introduced dumpDeleteBatchLocked function for managing delete batches.
  • Modified dumpBatchLocked to incorporate delete count checks.
  • Updated transaction write handling to support S3 operations.
  • +149/-26
    types.go
    Add delete count tracking to transactions                               

    pkg/vm/engine/disttae/types.go

    • Added approximateInMemDeleteCnt to track delete entries.
    +2/-0     
    Tests
    workspace_test.go
    Add test for S3 write operations with deletes                       

    pkg/vm/engine/test/workspace_test.go

  • Added a new test Test_BigDeleteWriteS3 for testing delete operations.
  • Verified S3 write operations with delete scenarios.
  • +127/-0 

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Key issues to review

    Performance Concern
    The new dumpDeleteBatchLocked function may cause performance issues for large delete operations, as it processes all delete entries at once.

    Code Complexity
    The dumpBatchLocked function has become more complex with the addition of delete handling, which may make it harder to maintain and understand.

    @matrix-meow matrix-meow added the size/M Denotes a PR that changes [100,499] lines label Sep 14, 2024
    @mergify mergify bot added the kind/feature label Sep 14, 2024
    Copy link

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Maintainability
    Extract condition for dumping delete batch into a separate function

    Consider extracting the condition for dumping delete batch into a separate function
    for better readability and maintainability.

    pkg/vm/engine/disttae/txn.go [455-459]

    -if txn.approximateInMemDeleteCnt >= txn.engine.insertEntryMaxCount {
    +if txn.shouldDumpDeleteBatch() {
     			if err := txn.dumpDeleteBatchLocked(offset); err != nil {
     				return err
     			}
     		}
     
    +// Add this method to the Transaction struct
    +func (txn *Transaction) shouldDumpDeleteBatch() bool {
    +    return txn.approximateInMemDeleteCnt >= txn.engine.insertEntryMaxCount
    +}
    +
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Extracting the condition into a separate function enhances code readability and maintainability by encapsulating logic and reducing complexity in the main function.

    8
    Use a named constant for the delete threshold value

    Consider using a constant for the threshold value '1' in the condition. This would
    improve code readability and maintainability.

    pkg/vm/engine/disttae/txn.go [414-415]

    +const deleteThreshold = 1
     if txn.workspaceSize < txn.engine.workspaceThreshold && txn.insertCount < txn.engine.insertEntryMaxCount &&
    -			txn.approximateInMemDeleteCnt < txn.engine.insertEntryMaxCount {
    +			txn.approximateInMemDeleteCnt < deleteThreshold {
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Using a named constant for the delete threshold value improves code readability and maintainability by making the code more self-explanatory and easier to update if the threshold changes.

    7
    Use a named constant for the minimum delete count threshold

    Consider using a constant or configuration value for the threshold '1' in the
    condition. This would improve code flexibility and maintainability.

    pkg/vm/engine/disttae/txn.go [630-632]

    -if deleteCnt < txn.engine.insertEntryMaxCount {
    +const minDeleteCountForDump = 1
    +if deleteCnt < minDeleteCountForDump {
     		return nil
     	}
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Defining a named constant for the minimum delete count threshold enhances code flexibility and maintainability by making it easier to adjust the threshold value and understand its purpose in the code.

    7
    Best practice
    Use a more descriptive variable name in type assertion

    Consider using a more descriptive variable name instead of 'v' in the type
    assertion. This would improve code readability.

    pkg/vm/engine/disttae/txn.go [561-565]

    -if v, ok := tbl.(*txnTableDelegate); ok {
    -			table = v.origin
    +if delegate, ok := tbl.(*txnTableDelegate); ok {
    +			table = delegate.origin
     		} else {
     			table = tbl.(*txnTable)
     		}
     
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: Using a more descriptive variable name improves code readability and understanding, especially for those unfamiliar with the codebase.

    6

    pkg/vm/engine/disttae/txn.go Show resolved Hide resolved
    @mergify mergify bot merged commit 29b8da7 into matrixorigin:main Sep 16, 2024
    18 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    4 participants