Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Binary storage for player data #4126

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

WalshyDev
Copy link
Member

@WalshyDev WalshyDev commented Feb 11, 2024

PR Checklist

  • I have fully tested the proposed changes and promise that they will not break everything into chaos.
  • I have also tested the proposed changes in combination with various popular addons and can confirm my changes do not break them.
  • I have made sure that the proposed changes do not break compatibility across the supported Minecraft versions (1.16.* - 1.20.*).
  • I followed the existing code standards and didn't mess up the formatting.
  • I did my best to add documentation to any public classes or methods I added.
  • I have added Nonnull and Nullable annotations to my methods to indicate their behaviour for null values
  • I added sufficient Unit Tests to cover my code.

What changed

  • There's a new config property - storage.player-data
    • This accepts two values, legacy and binary
    • binary is marked as experimental
    • legacy is the default value for now
  • If binary (this PR) is selected in the config for player data we will show a large warning banner in the console:
    experimental warning
  • Created a new BinaryStorage to handle writing the files as binary (more on this later)
  • This BinaryStorage is changing the way researches are saved, they're now using their NamespacedKey's rather than incremental IDs. This has been a change we should have made forever ago and this is step 1 to removing these number IDs.

Storage choice

Ok so, as you can see from the PR, the storage is done with NBT. The question I'm sure you're asking is why NBT over xyz?
Well, there's a few reasons;

  1. Familiarity
    a. Lots of server owners and developers are familiar with NBT. This makes adoption internally and externally much easier (especially with how much MC has leaned into it over the years)
  2. Solid storage
    a. NBT is about as small as it can get. Some people think it's inflated, the format isn't. Usage however is. That isn't something any format can really prevent.
  3. Existing tooling
    a. Lots of existing tooling exists for server owners to read these files (a bit more on this later as well)

Note

This is a simple implementation which would have to have the whole file rewritten each time. I'll explore ways to avoid this but without limiting file size, there isn't a great way to do it.

Usage example

(backpack contains 3 Slimefun items; Energized Energy Capacitor, Energized Solar Generator and a Multi Tool VII)

Legacy file with all researches

-rw-r--r--   1 walshy  staff  3511 11 Feb 23:41 52e0fc97-c914-4def-8899-8dbdf0c486c0.yml

Total: 3.5 KB

Legacy file with all researches and a waypoint

player file;

-rw-r--r--  1 walshy  staff  3511 12 Feb 00:42 data-storage/Slimefun/Players/52e0fc97-c914-4def-8899-8dbdf0c486c0.yml

3.5 KB

waypoint file:

-rw-r--r--  1 walshy  staff  110 12 Feb 00:42 data-storage/Slimefun/waypoints/52e0fc97-c914-4def-8899-8dbdf0c486c0.yml

110 bytes

Total: 3.6 KB

Legacy file with all researches, a backpack and a waypoint

player file:

-rw-r--r--  1 walshy  staff  8526 12 Feb 00:00 data-storage/Slimefun/Players/52e0fc97-c914-4def-8899-8dbdf0c486c0.yml

8.5 KB

waypoint player file:

-rw-r--r--  1 walshy  staff  110 12 Feb 00:00 data-storage/Slimefun/waypoints/52e0fc97-c914-4def-8899-8dbdf0c486c0.yml

110 bytes

Total: 8.6 KB

Binary file with all researches (no compression)

-rw-r--r--  1 walshy  staff  6875 12 Feb 00:08 data-storage/Slimefun/Players/52e0fc97-c914-4def-8899-8dbdf0c486c0.dat

Total: 6.8 KB

(this is so big because of using namespaces instead of int IDs, these take up way more space per research)

Binary file with all researches (gzip compression)

-rw-r--r--  1 walshy  staff  2308 12 Feb 00:06 data-storage/Slimefun/Players/52e0fc97-c914-4def-8899-8dbdf0c486c0.dat

Total: 2.3 KB

Binary file with all researches and a waypoint (gzip compression)

-rw-r--r--  1 walshy  staff  2457 12 Feb 00:37 data-storage/Slimefun/Players/52e0fc97-c914-4def-8899-8dbdf0c486c0.dat

Total: 2.4 KB

1,164 bytes smaller than legacy, 32.1% reduction in size
(reminder that most of this space will be taken by the namespace researches -- with int IDs this would be much smaller)

Tooling

I'm sure we've all used NBT programs in the past to view an NBT file, NBTExplorer is the biggest one, there's also NBT Studio. The goal is for these programs (and others) to be able to view the files. I haven't tested this yet but I think this would be ideal.

However, these only support the compression algorithms that MC support (gzip & zlib). I'd love to use lz4 or zstd. Luckily, MC recently announced lz4 in snapshot 24w04a which will be coming in 1.20.5. This hopefully means one or both of these tools update to support it.
That still doesn't mean they'd support zstd though so maybe we go lz4?

So as we stand today, we can't really go better than gzip/zlib but hopefully soon we can do lz4. Is this ok? How much should we care about existing tooling?

My goal is to avoid building our own 😅

For the reviewers

Open questions

  • Are we good with NBT? Any other format people really want?
    • If another format, why? What advantage does it give?
  • Do we want to save player data elsewhere?
    • We've been saving to data-storage/Slimefun/Players and data-storage/Slimefun/waypoints for a long time. If we want to change that, this is the time
    • One change at least I'd like to make is moving waypoints into the player files. I need to figure out exactly how the shared waypoints work though before I do this, they seem to not do anything different but then confused how they're shared.

Important

As we stand today (12th Feb), looking for a "yay" or "nay" on the PR approach and answers to the above questions rather than a "this code and data structure is perfect".

Testing

  • TODO

TODO

  • Sign off from reviewers we want to go this way
  • Finish up NBT side on dough
    • Write any more methods that'd be helpful
    • Document all functions (snooze)
    • Add more tests covering more complex cases + compression
  • Implement backpack support
  • Cleanup a bit
  • Do some thorough testing

After this PR

  • Work on getting to a more stable/finalized API for storage as a whole
  • Build migration from legacy -> binary
  • Work on moving binary storage to default

Census

Yay - 3
@J3fftw1
@ProfElements
@Sfiguz7

Nay - 0

@github-actions github-actions bot added the 🎈 Feature This Pull Request adds a new feature. label Feb 11, 2024
Copy link
Contributor

Your Pull Request was automatically labelled as: "🎈 Feature"
Thank you for contributing to this project! ❤️

@J3fftw1
Copy link
Contributor

J3fftw1 commented Feb 12, 2024

  • Are we good with NBT? Any other format people really want?
    * If another format, why? What advantage does it give?

I think NBT is the right way, we already discussed loads about this and about other storage options.
but the gist of our discussion was NBT is good since most of the MC community is known with this.
Even non admins/devs know about NBT this is widely used even by players using singeplayer.

  • Do we want to save player data elsewhere?
    * We've been saving to data-storage/Slimefun/Players and data-storage/Slimefun/waypoints for a long time. If we want to change that, this is the time
    * One change at least I'd like to make is moving waypoints into the player files. I need to figure out exactly how the shared waypoints work though before I do this, they seem to not do anything different but then confused how they're shared.

My vote would be to move this to ~/plugins/Slimefun/data-storage.
We dont get reports of this often but we do occasionally that people got rid of the data by moving servers.
Saving in ~/plugins/Slimefun/data-storage would make more sense to me.

My questions

How do we get people to use this, we couldn't answer this in our DMs either.
It's an opt-in option people barely read configs anyways in this space so I doubt people will opt-in.
I would love to see stats on this as well how many opt-in how many opt-out.
Do we maybe want to throw it straight in dev its development anyways but other point is we also tell servers to run dev and advertise RC for plugin use only. soo /shrug

Is there a big difference between gzip, zlib and lz4.
Can we move later on to lz4 if not lets just use what ever is supported now.
on the other hand people wont move to different standards if no one supports it.
Same discussion we had on to use postgres even though no one supports it.
Should we care about existing tooling, yes but no idc much. People need to adapt anyways but idk i will leave that to the others.

Note:

We should run these tests again on a bigger scale bigger server more player data.

Edit:
My answer would be yay for this btw.

@WalshyDev
Copy link
Member Author

WalshyDev commented Feb 12, 2024

How do we get people to use this, we couldn't answer this in our DMs either.
It's an opt-in option people barely read configs anyways in this space so I doubt people will opt-in.
I would love to see stats on this as well how many opt-in how many opt-out.
Do we maybe want to throw it straight in dev its development anyways but other point is we also tell servers to run dev and advertise RC for plugin use only. soo /shrug

The age old question of how to make people use the thing.
Not sure there's really an easy way sadly. We can enable by default for new servers but are we likely to find bugs... or just push people away? New users aren't usually the ones to report stuff.
However, this is how we rolled out item IDs back in the day. Compat mode was disabled by default for new servers and it slowly moved over to be everyone.

We'll definitely log to bStats too... though I can't make a new panel so gonna have to throw it somewhere else

Is there a big difference between gzip, zlib and lz4.
Can we move later on to lz4 if not lets just use what ever is supported now.

We can do whatever we want :p
As for difference, mostly comes down to compression ratio and speed. Lz4 doesn't compress as good as others but it is very fast.
Zstd has the best compression ratio but it's slower.
I'd say these days we value speed over ratio. Disk space is much easier to get than time.

We should run these tests again on a bigger scale bigger server more player data.

Agreed, I need the biggest player file we can find and a bunch of blockstorage later too.

@ProfElements
Copy link
Contributor

ProfElements commented Feb 12, 2024

I think NBT is a good format.

I think that binary storage in general is a good idea.

I think moving away from data-storage might be a misstep if taken due to data-storage being easily accessible instead of a multi-folder dive into plugins or something similar.

Moving way-points into players is probably going to be a big win unless it constantly duplicates data.

I think we should support any compression format Minecraft supports. Zlib/Gzip is a fine alternative until its better for us to break it for a faster or more efficient compression like lz4

Edit: This is a yay for me aswell. I will probably move to binary as soon as its available on dev

@WalshyDev WalshyDev added the 🔧 Technical Thread This Issue is a technical Thread where developers can discuss technical changes to Slimefun. label Feb 13, 2024
@Sfiguz7
Copy link
Member

Sfiguz7 commented Feb 14, 2024

I agree with most of what has been said above.

NBT sounds like a good format to use due to the accessibility - researches taking a bit more space is honestly not a problem.

As long as we compress I don't mind which compression method is used, it's going to be quite the jump compared to what we have had for years no matter what, and if anything speed is probably something that is more important when we have tens of thousands of blocks to take care of, so the fact it is already available just makes it even more welcome of a choice.

I agree with Jeff that moving the data folder inside Slimefun's folder would make sense, no other plugin is ever going to use that folder ever and as a user I had always found it weird how it is not a subfolder: /plugins/Slimefun is the first place I would check when looking for Slimefun related anything.

Overall, yay for me

@TheBusyBot TheBusyBot added the ⚡ Merge Conflicts This Pull Request has merged conflicts which need to be resolved! label Feb 16, 2024

long mostSig = uuid.getMostSignificantBits();
long leastSig = uuid.getLeastSignificantBits();
int[] ints = new int[] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NBT has a long array type, I suggest we use that instead of an int array

Comment on lines +13 to +16
public static final NamespacedKey WORLD = new NamespacedKey(Slimefun.instance(), "world");
public static final NamespacedKey POSITION = new NamespacedKey(Slimefun.instance(), "position");
public static final NamespacedKey PITCH = new NamespacedKey(Slimefun.instance(), "pitch");
public static final NamespacedKey YAW = new NamespacedKey(Slimefun.instance(), "yaw");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't these in LocationSerializer?

Comment on lines +135 to +139
waypointTag.putString(CommonKeys.ID, waypoint.getId());
waypointTag.putString(CommonKeys.NAME, waypoint.getName());
waypointTag.putCompound(CommonKeys.LOCATION, LocationSerializer.INSTANCE.serialize(waypoint.getLocation()));

waypointMap.putCompound(new NamespacedKey(Slimefun.instance(), waypoint.getId()), waypointTag);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we use a BinarySerializer for this?

@Seggan
Copy link
Contributor

Seggan commented Apr 16, 2024

Has the NBT code for dough been PR'd yet? I don't see it so I assume not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🎈 Feature This Pull Request adds a new feature. ⚡ Merge Conflicts This Pull Request has merged conflicts which need to be resolved! 🔧 Technical Thread This Issue is a technical Thread where developers can discuss technical changes to Slimefun.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants