AOF fsync=always, and certain Redis Cluster bus operations, require to
fsync data on disk before replying with an acknowledge.
In such case, in order to implement Group Commits, we want to be sure
that queries that are read in a given cycle of the event loop, are never
served to clients in the same event loop iteration. This way, by using
the event loop "before sleep" callback, we can fsync the information
just one time before returning into the event loop for the next cycle.
This is much more efficient compared to calling fsync() multiple times.
Unfortunately because of a bug, this was not always guaranteed: the
actual way the events are installed was the sole thing that could
control. Normally this problem is hard to trigger when AOF is enabled
with fsync=always, because we try to flush the output buffers to the
socekt directly in the beforeSleep() function of Redis. However if the
output buffers are full, we actually install a write event, and in such
a case, this bug could happen.
This change to ae.c modifies the event loop implementation to make this
concept explicit. Write events that are registered with:
AE_WRITABLE|AE_BARRIER
Are guaranteed to never fire after the readable event was fired for the
same file descriptor. In this way we are sure that data is persisted to
disk before the client performing the operation receives an
acknowledged.
However note that this semantics does not provide all the guarantees
that one may believe are automatically provided. Take the example of the
blocking list operations in Redis.
With AOF and fsync=always we could have:
Client A doing: BLPOP myqueue 0
Client B doing: RPUSH myqueue a b c
In this scenario, Client A will get the "a" elements immediately after
the Client B RPUSH will be executed, even before the operation is persisted.
However when Client B will get the acknowledge, it can be sure that
"b,c" are already safe on disk inside the list.
What to note here is that it cannot be assumed that Client A receiving
the element is a guaranteed that the operation succeeded from the point
of view of Client B.
This is due to the fact that the barrier exists within the same socket,
and not between different sockets. However in the case above, the
element "a" was not going to be persisted regardless, so it is a pretty
synthetic argument.
Normally in modern Redis you can't create zero-len lists, however it's
possible to load them from old RDB files generated, for instance, using
Redis 2.8 (see issue #4409). The "Right Thing" would be not loading such
lists at all, but this requires to hook in rdb.c random places in a not
great way, for a problem that is at this point, at best, minor.
Here in this commit instead I just fix the fact that zero length lists,
materialized as quicklists with the first node set to NULL, were
iterated in the wrong way while they are saved, leading to a crash.
The other parts of the list implementation are apparently able to deal
with empty lists correctly, even if they are no longer a thing.
There was not enough sanity checking in the code loading the slots of
Redis Cluster from the nodes.conf file, this resulted into the
attacker's ability to write data at random addresses in the process
memory, by manipulating the index of the array. The bug seems
exploitable using the following techique: the config file may be altered so
that one of the nodes gets, as node ID (which is the first field inside the
structure) some data that is actually executable: then by writing this
address in selected places, this node ID part can be executed after a
jump. So it is mostly just a matter of effort in order to exploit the
bug. In practice however the issue is not very critical because the
bug requires an unprivileged user to be able to modify the Redis cluster
nodes configuration, and at the same time this should result in some
gain. However Redis normally is unprivileged as well. Yet much better to
have this fixed indeed.
Fix#4278.
when SHUTDOWN command is recived it is possible that some of the recent
command were not yet flushed from the AOF buffer, and the server
experiences data loss at shutdown.
This function failed when an internal-only flag was set as an only flag
in a node: the string was trimmed expecting a final comma before
exiting the function, causing a crash. See issue #4142.
Moreover generation of flags representation only needed at DEBUG log
level was always performed: a waste of CPU time. This is fixed as well
by this commit.
Issue #4084 shows how for a design error, GEORADIUS is a write command
because of the STORE option. Because of this it does not work
on readonly slaves, gets redirected to masters in Redis Cluster even
when the connection is in READONLY mode and so forth.
To break backward compatibility at this stage, with Redis 4.0 to be in
advanced RC state, is problematic for the user base. The API can be
fixed into the unstable branch soon if we'll decide to do so in order to
be more consistent, and reease Redis 5.0 with this incompatibility in
the future. This is still unclear.
However, the ability to scale GEO queries in slaves easily is too
important so this commit adds two read-only variants to the GEORADIUS
and GEORADIUSBYMEMBER command: GEORADIUS_RO and GEORADIUSBYMEMBER_RO.
The commands are exactly as the original commands, but they do not
accept the STORE and STOREDIST options.
1. brpop last key index, thus checking all keys for slots.
2. Memory leak in clusterRedirectBlockedClientIfNeeded.
3. Remove while loop in clusterRedirectBlockedClientIfNeeded.
And many other related Github issues... all reporting the same problem.
There was probably just not enough backlog in certain unlucky runs.
I'll ask people that can reporduce if they see now this as fixed as
well.
This reverts commit 153f2f00ea5c74cbd63d92a261d31c42df8dce21.
Jemalloc 4.4.0 is apparently causing deadlocks in certain
systems. See for example https://github.com/antirez/redis/issues/3799.
As a cautionary step we are reverting the commit back and
releasing a new stable Redis version.
After investigating issue #3796, it was discovered that MIGRATE
could call migrateCloseSocket() after the original MIGRATE c->argv
was already rewritten as a DEL operation. As a result the host/port
passed to migrateCloseSocket() could be anything, often a NULL pointer
that gets deferenced crashing the server.
Now the socket is closed at an earlier time when there is a socket
error in a later stage where no retry will be performed, before we
rewrite the argument vector. Moreover a check was added so that later,
in the socket_err label, there is no further attempt at closing the
socket if the argument was rewritten.
This fix should resolve the bug reported in #3796.
Ziplists had a bug that was discovered while investigating a different
issue, resulting in a corrupted ziplist representation, and a likely
segmentation foult and/or data corruption of the last element of the
ziplist, once the ziplist is accessed again.
The bug happens when a specific set of insertions / deletions is
performed so that an entry is encoded to have a "prevlen" field (the
length of the previous entry) of 5 bytes but with a count that could be
encoded in a "prevlen" field of a since byte. This could happen when the
"cascading update" process called by ziplistInsert()/ziplistDelete() in
certain contitious forces the prevlen to be bigger than necessary in
order to avoid too much data moving around.
Once such an entry is generated, inserting a very small entry
immediately before it will result in a resizing of the ziplist for a
count smaller than the current ziplist length (which is a violation,
inserting code expects the ziplist to get bigger actually). So an FF
byte is inserted in a misplaced position. Moreover a realloc() is
performed with a count smaller than the ziplist current length so the
final bytes could be trashed as well.
SECURITY IMPLICATIONS:
Currently it looks like an attacker can only crash a Redis server by
providing specifically choosen commands. However a FF byte is written
and there are other memory operations that depend on a wrong count, so
even if it is not immediately apparent how to mount an attack in order
to execute code remotely, it is not impossible at all that this could be
done. Attacks always get better... and we did not spent enough time in
order to think how to exploit this issue, but security researchers
or malicious attackers could.
REPRODUCING:
The bug can be reproduced with the following commands.
redis-cli del list
redis-cli rpush list one
redis-cli rpush list two
redis-cli rpush list
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
redis-cli rpush list
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
redis-cli rpush list three
redis-cli rpush list a
redis-cli lrem list 1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
redis-cli linsert list after
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
10
redis-cli lrange list 0 -1
Instead of "rpush list a", use "rpush list 10" in order to trigger a
data corruption instead of a crash.