Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add backup and restore #121

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

add backup and restore #121

wants to merge 3 commits into from

Conversation

eos175
Copy link

@eos175 eos175 commented Jan 21, 2023

Support dump to file and load from file.

cache_test.go Outdated
iters := 100
cache := NewCache(1024)

f, err := os.Open("/tmp/cache_backup.bin")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case depends on the TestBackup has run first.
It would be better if we merge them into one test case TestBackupRestore.

And the ExpireAt related logic is not covered in the test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it would be better together, can I cover ExpireAt in the test, assuming that the backup and restore do not take more than 1s

totalExpire := 15

time.Sleep(time.Duration(minTimeExpire+totalExpire) * time.Second)

for i := 0; i < iters; i++ {
	key := mrand.Int()
	val := strconv.Itoa(i)
	val2, err := cache.GetInt(int64(key))

	if i < totalExpire {
		if err != nil && err != ErrNotFound {
			t.Errorf("err: %s", err)
		}
		continue
	}

	if err != nil {
		t.Errorf("err: %s", err)
	}

	if string(val2) != val {
		t.Errorf("err: %v %q==%q", key, val, val2)
	}

}

@coocood
Copy link
Owner

coocood commented Jan 25, 2023

For persistent file format, we better add file format version in case we change it later, and calculate the checksum of the content.

@eos175
Copy link
Author

eos175 commented Jan 25, 2023

If it were a database, yes, but since it is a cache, its data is volatile.
The use that i am giving it is to avoid the cold cache.

@coocood
Copy link
Owner

coocood commented Jan 26, 2023

If it were a database, yes, but since it is a cache, its data is volatile. The use that i am giving it is to avoid the cold cache.

It's not about durability, it's about correctness. If the on-disk data is corrupted and we loaded it, then we would return a wrong value to the application.

@eos175
Copy link
Author

eos175 commented Jan 26, 2023

what do you think of this?

version 4 | create_at 4
...
crc 2 | key_size 2 | value_size 4 | expire_at 4 | key N | value N
...

@coocood
Copy link
Owner

coocood commented Jan 27, 2023

what do you think of this?

version 4 | create_at 4
...
crc 2 | key_size 2 | value_size 4 | expire_at 4 | key N | value N
...

If we store crc value at the end of the file, we can write the entry and calculate the crc value at the same time, so we don't need to iterate the cache twice.

And crc32c (Castagnoli) is the fastest checksum algorithm, it takes 4 bytes.

@eos175
Copy link
Author

eos175 commented Jan 27, 2023

A single crc at the end of the file?

I propose that each entry has a checksum, it can be xxhash, crc32... and only use 2 bytes.

@coocood
Copy link
Owner

coocood commented Jan 27, 2023

A single crc at the end of the file?

I propose that each entry has a checksum, it can be xxhash, crc32... and only use 2 bytes.

On restore, we will load all the entries at once, so a singe checksum will be sufficient.

@eos175
Copy link
Author

eos175 commented Jan 27, 2023

On restore, we will load all the entries at once, so a singe checksum will be sufficient.

the problem is that it must be read 2 times, first to check the checksum and then to store.
If, for example, I am downloading the cache over the network and it is very large, it cannot store as it is being downloaded.

@coocood
Copy link
Owner

coocood commented Jan 28, 2023

On restore, we will load all the entries at once, so a singe checksum will be sufficient.

the problem is that it must be read 2 times, first to check the checksum and then to store. If, for example, I am downloading the cache over the network and it is very large, it cannot store as it is being downloaded.

Then we can clear the cache if the checksum mismatch at the end.

@eos175
Copy link
Author

eos175 commented Feb 3, 2023

On restore, we will load all the entries at once, so a singe checksum will be sufficient.

the problem is that it must be read 2 times, first to check the checksum and then to store. If, for example, I am downloading the cache over the network and it is very large, it cannot store as it is being downloaded.

Then we can clear the cache if the checksum mismatch at the end.

if there is already data in the cache we will go to a cold cache

@coocood
Copy link
Owner

coocood commented Feb 3, 2023

On restore, we will load all the entries at once, so a singe checksum will be sufficient.

the problem is that it must be read 2 times, first to check the checksum and then to store. If, for example, I am downloading the cache over the network and it is very large, it cannot store as it is being downloaded.

Then we can clear the cache if the checksum mismatch at the end.

if there is already data in the cache we will go to a cold cache

It's hard to prove which way is better, but redis RDB file use a single checksum, let's just follow it.

@xiehui3651
Copy link
Contributor

Support dump to file and load from file.

What is the time cost of 1 million data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants