Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory allocation attack? #627

Open
Sneakometer opened this issue Apr 10, 2021 · 29 comments
Open

memory allocation attack? #627

Sneakometer opened this issue Apr 10, 2021 · 29 comments

Comments

@Sneakometer
Copy link

Sneakometer commented Apr 10, 2021

Hello Waterfall community, i'm owning a smaller minecraft server with about 50 max concurrent players.
I am recently facing bot attacks where multiple ips (proxies?) connect to the waterfall proxy, each allocating 16MB direct memory and thus rendering the server unusable in seconds.

I've allocated 512MB memory to waterfall, which was plently for the last 3 years. I still doubled it to 1 gig for now, but the DOS "attack" still manages to fill the ram in seconds.

This exception is spammed to console during attack:

[00:12:21] [Netty Worker IO Thread #9/ERROR]: [/36.94.40.114:53052] <-> InitialHandler - encountered exception
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 520093703, max: 536870912)
	at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:775) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:730) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:645) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:621) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:204) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:188) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.buffer.PoolArena.allocate(PoolArena.java:138) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.buffer.PoolArena.reallocate(PoolArena.java:288) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:118) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:307) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:282) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1105) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:99) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_281]

You could argue that this is a ddos attack and can't be fixed by bungeecord/waterfall.
However, the host machine was using about 30% cpu and 10% of it's network resources, it's really only bungeecord that is struggeling to keep up with that many requests.

I do use iptables to rate limit new connections per ip to the bc, but this does not really help as the connections come from too many different ips (proxy list?). I now added a global rate limit for syn packets to bungeecord, which somewhat mitigates the attack by not crashin the server. However, no new players can join while an attack is running. So this is not a permanent option :/

I also don't make profit from my server, so i can't afford professional layer 7 ddos mitigation. Hoping to get help here is my only option.
Any help is appreciated.

@narumii
Copy link

narumii commented Apr 10, 2021

You could argue that this is a ddos attack and can't be fixed by bungeecord/waterfall.

Yes thats true, bungeecoord/waterfall had many exploits but md5 claims that they don't even work xd, waterfall does something but still their "antidos" is trash.

However, the host machine was using about 30% cpu and 10% of it's network resources, it's really only bungeecord that is struggeling to keep up with that many requests.

Imo bungeecord has fucked networking system, even vanilla's one is better xd.

So probabbly it's a encryption response or other packet that can has very big data, waterfall and bungeecord doesn't have good limiter for it

@antbig
Copy link
Contributor

antbig commented Apr 10, 2021

Do you have the beginning of the attack ? (the logs before the java.lang.OutOfMemoryError: Direct buffer memory) ?

@Janmm14
Copy link
Contributor

Janmm14 commented Apr 10, 2021

@narumii
This offensive language will not get you anywhere. You did not provide any useful information.

I suggest that the maintainers of waterfall delete your comment.

I suggest that you try out #609, there's also a link to a test Waterfall jar in there. That might help with your problem.

@Sneakometer
Copy link
Author

Do you have the beginning of the attack ? (the logs before the java.lang.OutOfMemoryError: Direct buffer memory) ?

Here is the log from 5 minutes before the attack:
https://pastebin.com/FrMPjp4G

In short, this:
[22:30:00] [Netty Worker IO Thread #2/WARN]: [/CENSORED:57820] <-> InitialHandler - bad packet ID, are mods in use!? VarInt too big

@Janmm14
Copy link
Contributor

Janmm14 commented Apr 10, 2021

Do you have the beginning of the attack ? (the logs before the java.lang.OutOfMemoryError: Direct buffer memory) ?

Here is the log from 5 minutes before the attack:
https://pastebin.com/FrMPjp4G

In short, this:
[22:30:00] [Netty Worker IO Thread #2/WARN]: [/CENSORED:57820] <-> InitialHandler - bad packet ID, are mods in use!? VarInt too big

Is that amount of server list pings normal for your server?

@narumii
Copy link

narumii commented Apr 10, 2021

@narumii
This offensive language will not get you anywhere. You did not provide any useful information.

I suggest that the maintainers of waterfall delete your comment.

I suggest that you try out #609, there's also a link to a test Waterfall jar in there. That might help with your problem.

Yeah telling truth is offensive :(
Big "DoS mitigations" that doens't works, also "DoS mitigations" from velocity doesn't work properly idk why /shrug

@electronicboy
Copy link
Member

electronicboy commented Apr 10, 2021 via email

@Sneakometer
Copy link
Author

Sneakometer commented Apr 10, 2021

Is that amount of server list pings normal for your server?

Yeah pretty much. 2-3 pings per second is what i would consider normal for the server.

Many of these issues can be mitigated with basic configuration of s firewall to throttle connections in the event of an attack

As already mentioned, i am rate limiting connections, filetering bad packets and limiting total connections per ip.
Can you please tell me about the "basic firewall" so i can configure mine? Thanks

@electronicboy
Copy link
Member

this specific case doesn't look like a basic firewall setup will help, I think I know what they're doing and it's shamefully an artifact of a service exposed to the internet doing its job, I think I have a way to limit the damage but will impact performance for some people relying on certain aspects of how netty works already

@Sneakometer
Copy link
Author

I'm now running the test version from @Janmm14.
I will let you know if it helped or not should the attackers do their thing again.
I've noticed no issues so far.

@electronicboy
Copy link
Member

#628

@Sneakometer
Copy link
Author

@electronicboy Due to the recent log4j exploits i had to switch to the latest official build. Since, our servers were attacked again with the same result. Your fix in #628 doesn't seem to be working or did break at some time in between. The attacker(s) where able to OOM Bungeecord within seconds using only about 40 requests.

20:40:58] [Netty Worker IO Thread #8/ERROR]: [/X.X.X.X:54102] <-> InitialHandler - encountered exception
java.lang.OutOfMemoryError: Cannot reserve 16777216 bytes of direct buffer memory (allocated: 1061287832, limit: 1073741824)
	at java.nio.Bits.reserveMemory(Bits.java:178) ~[?:?]
	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:121) ~[?:?]
	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:332) ~[?:?]
	at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:648) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:623) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:202) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:186) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.buffer.PoolArena.allocate(PoolArena.java:136) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.buffer.PoolArena.reallocate(PoolArena.java:286) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:118) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:305) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:280) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:99) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
	at java.lang.Thread.run(Thread.java:833) [?:?]

Running Waterfall build 473, Java 17, Debian

@Janmm14
Copy link
Contributor

Janmm14 commented Dec 30, 2021

@Sneakometer That fix in 628 was intentionally reverted here: f17de74

@Xernium
Copy link
Member

Xernium commented Dec 30, 2021

That was intentional indeed, there are now some packets that can easily exceed that size (up to 16mb) which is already a whole ton too much
Not sure how to fix this again

@Janmm14
Copy link
Contributor

Janmm14 commented Dec 30, 2021

That was intentional indeed, there are now some packets that can easily exceed that size (up to 16mb) which is already a whole ton too much Not sure how to fix this again

Is this problem happening because the client sends too large packets or because every packet, even if very small, is getting 16mib ram reserved?

Can a legit client send such large packets (or only the server) or could we have a different memory pool with different settings for the packets sent by the client?

@antbig
Copy link
Contributor

antbig commented Dec 30, 2021

The client is telling that the incoming packet size is very high to force netty to allocate the maximum amont of ram. (16mib) but they send the packet very very slowly. This way with a very little amount of client, you can create an oom.
Only the server is sending very large packet (chunk).

@Janmm14
Copy link
Contributor

Janmm14 commented Dec 30, 2021

But we do not allocate a buffer with the size before the full packet is there in Varint21FrameDecoder.

So until then, the buffer we have is completely handled by netty and that should only grow as more data arrives?

@electronicboy
Copy link
Member

This is seemingly to me akin to the slowloris attacks done against apache, reducing the native buffer size was NOT a fix, in any form shape or capacity was it really a fix, it just did as much of a mitigation here as possible towards junk being allocated in the trunk, which, pre 1.18 was a trivial way to at least mitigate this to some degree

Netty has a buffer pool which allows these native, direct buffers to be pooled rather than the expensive allocations of them across the board, you can increase your direct memory limit or use whatever system property it was to mitigate directly allocating these into direct memory, but, these buffers are slowly filled up and are shared across stuff; Here there's just enough connections using those buffers that it tries to allocate a new one and fails

The client isn't telling the thing to allocate a huge buffer, the buffer size is generally fixed (resizing these is expensive, so, you wanna avoid that)

@Janmm14
Copy link
Contributor

Janmm14 commented Dec 30, 2021

@electronicboy
Since I moved the readtimeouthandler after the Varint21FrameDecoder in Bungee to counter slowloris-style attacks, this would mean that the attacker is able to do this in just 30 seconds?

@electronicboy
Copy link
Member

the buffers are fixed size, so all you've gotta do is cause enough of them to be created

@Janmm14
Copy link
Contributor

Janmm14 commented Dec 30, 2021

the buffers are fixed size, so all you've gotta do is cause enough of them to be created

That really does not sound like the right thing to do from netty.

@electronicboy
Copy link
Member

resizing the buffers is stupidly expensive, so it is the right thing to do, the big issue here is that you need to drain them at a decent pace, this is basically IMHO a massive architecture issue across the board

@Janmm14
Copy link
Contributor

Janmm14 commented Dec 30, 2021

@Sneakometer Did you made changes to the connection throttle configuration?

@Janmm14
Copy link
Contributor

Janmm14 commented Dec 30, 2021

the buffers are fixed size, so all you've gotta do is cause enough of them to be created

This really does not sound like it could be true.
This would mean that with 512MiB ram there could only be 32 buffers allocated for 32 connections.

@electronicboy
Copy link
Member

it's not supposed to be a buffer per connection, basically; This all gets nuanced on the technicalities of netty

@Sneakometer
Copy link
Author

@Sneakometer Did you made changes to the connection throttle configuration?

Not sure what the default is, but it's set to connection_throttle: 8000 on my server.

@Janmm14
Copy link
Contributor

Janmm14 commented Mar 10, 2022

Netty changed default to what we had, so apparantly this change does not affect the maximum size of buffers.

@electronicboy
Copy link
Member

I do not recall seing any resizing logic for the buffers, afaik the idea is that they're fixed size to prevent constant rescaling, most apps using netty are designed to deal with processing the queue effectively so that backpressure doesn't occur, etc

@electronicboy
Copy link
Member

ah, so, we set the capacity of the buffers, the thing has logic to allocate less by default looking at it, but, maybe tries to always reserve the capacity? Thus, too many buffers = hitting the limit
ofc, caveat here is that the entire system relies upon those buffers being drained effectively, this is basically an architectural issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants