After the first fix to the struct package I found another similar
problem, which is fixed by this patch. It could be reproduced easily by
running the following script:
return struct.unpack('f', "xxxxxxxxxxxxx",-3)
The above will access bytes before the 'data' pointer.
@soloestoy sent me this additional fixes, after searching for similar
problems to the one reported in mp_pack(). I'm committing the changes
because it was not possible during to make a public PR to protect Redis
users and give Redis providers some time to patch their systems.
During an auditing Apple found that the "struct" Lua package
we ship with Redis (http://www.inf.puc-rio.br/~roberto/struct/) contains
a security problem. A bound-checking statement fails because of integer
overflow. The bug exists since we initially integrated this package with
Lua, when scripting was introduced, so every version of Redis with
EVAL/EVALSHA capabilities exposed is affected.
Instead of just fixing the bug, the library was updated to the latest
version shipped by the author.
During an auditing effort, the Apple Vulnerability Research team discovered
a critical Redis security issue affecting the Lua scripting part of Redis.
-- Description of the problem
Several years ago I merged a pull request including many small changes at
the Lua MsgPack library (that originally I authored myself). The Pull
Request entered Redis in commit 90b6337c1, in 2014.
Unfortunately one of the changes included a variadic Lua function that
lacked the check for the available Lua C stack. As a result, calling the
"pack" MsgPack library function with a large number of arguments, results
into pushing into the Lua C stack a number of new values proportional to
the number of arguments the function was called with. The pushed values,
moreover, are controlled by untrusted user input.
This in turn causes stack smashing which we believe to be exploitable,
while not very deterministic, but it is likely that an exploit could be
created targeting specific versions of Redis executables. However at its
minimum the issue results in a DoS, crashing the Redis server.
-- Versions affected
Versions greater or equal to Redis 2.8.18 are affected.
-- Reproducing
Reproduce with this (based on the original reproduction script by
Apple security team):
https://gist.github.com/antirez/82445fcbea6d9b19f97014cc6cc79f8a
-- Verification of the fix
The fix was tested in the following way:
1) I checked that the problem is no longer observable running the trigger.
2) The Lua code was analyzed to understand the stack semantics, and that
actually enough stack is allocated in all the cases of mp_pack() calls.
3) The mp_pack() function was modified in order to show exactly what items
in the stack were being set, to make sure that there is no silent overflow
even after the fix.
-- Credits
Thank you to the Apple team and to the other persons that helped me
checking the patch and coordinating this communication.
A user with many connections (10 thousand) on a single Redis server
reports in issue #4983 that sometimes Redis is idle becuase at the same
time many clients need to resize their query buffer according to the old
policy.
It looks like this was created by the fact that we allow the query
buffer to grow without problems to a size up to PROTO_MBULK_BIG_ARG
normally, but when the client is idle we immediately are more strict,
and a query buffer greater than 1024 bytes is already enough to trigger
the resize. So for instance if most of the clients stop at the same time
this issue should be easily triggered.
This behavior actually looks odd, and there should be only a clear limit
after we say, let's look at this query buffer to check if it's time to
resize it. This commit puts the limit at PROTO_MBULK_BIG_ARG, and the
check is performed both if compared to the peak usage the current usage
is too big, or if the client is idle.
Then when the check is performed, to waste just a few kbytes is
considered enough to proceed with the resize. This should fix the issue.
We unblocked the client too early, when the group name object was no
longer valid in client->bpop, so propagating XCLAIM later in
streamPropagateXCLAIM() deferenced a field already set to NULL.
Removing the fix about 50% of the times the test will not be able to
pass cleanly. It's very hard to write a test that will always fail, or
actually, it is possible but then it's likely that it will consistently
pass if we change some random bit, so better to use randomization here.
Now that we have SETID, the inetrnals of consumer groups should be able
to handle the case of the same message delivered multiple times just
as a side effect of calling XREADGROUP. Normally this should never
happen but if the admin manually "XGROUP SETID mykey mygroup 0",
messages will get re-delivered to clients waiting for the ">" special
ID. The consumer groups internals were not able to handle the case of a
message re-delivered in this circumstances that was already assigned to
another owner.