Replication: disconnect blocked clients when switching to slave role.

Bug as old as Redis and blocking operations. It's hard to trigger since
only happens on instance role switch, but the results are quite bad
since an inconsistency between master and slave is created.

How to trigger the bug is a good description of the bug itself.

1. Client does "BLPOP mylist 0" in master.
2. Master is turned into slave, that replicates from New-Master.
3. Client does "LPUSH mylist foo" in New-Master.
4. New-Master propagates write to slave.
5. Slave receives the LPUSH, the blocked client get served.

Now Master "mylist" key has "foo", Slave "mylist" key is empty.

Highlights:

* At step "2" above, the client remains attached, basically escaping any
  check performed during command dispatch: read only slave, in that case.
* At step "5" the slave (that was the master), serves the blocked client
  consuming a list element, which is not consumed on the master side.

This scenario is technically likely to happen during failovers, however
since Redis Sentinel already disconnects clients using the CLIENT
command when changing the role of the instance, the bug is avoided in
Sentinel deployments.

Closes #2473.
This commit is contained in:
antirez 2015-03-24 16:00:09 +01:00
parent 9b7f8b1c9b
commit c3ad70901f
3 changed files with 26 additions and 0 deletions

View File

@ -155,3 +155,27 @@ void replyToBlockedClientTimedOut(redisClient *c) {
}
}
/* Mass-unblock clients because something changed in the instance that makes
* blocking no longer safe. For example clients blocked in list operations
* in an instance which turns from master to slave is unsafe, so this function
* is called when a master turns into a slave.
*
* The semantics is to send an -UNBLOCKED error to the client, disconnecting
* it at the same time. */
void disconnectAllBlockedClients(void) {
listNode *ln;
listIter li;
listRewind(server.clients,&li);
while((ln = listNext(&li))) {
redisClient *c = listNodeValue(ln);
if (c->flags & REDIS_BLOCKED) {
addReplySds(c,sdsnew(
"-UNBLOCKED force unblock from blocking operation, "
"instance state changed (master -> slave?)\r\n"));
unblockClient(c);
c->flags |= REDIS_CLOSE_AFTER_REPLY;
}
}
}

View File

@ -1395,6 +1395,7 @@ void blockClient(redisClient *c, int btype);
void unblockClient(redisClient *c);
void replyToBlockedClientTimedOut(redisClient *c);
int getTimeoutFromObjectOrReply(redisClient *c, robj *object, mstime_t *timeout, int unit);
void disconnectAllBlockedClients(void);
/* Git SHA1 */
char *redisGitSHA1(void);

View File

@ -1444,6 +1444,7 @@ void replicationSetMaster(char *ip, int port) {
server.masterhost = sdsnew(ip);
server.masterport = port;
if (server.master) freeClient(server.master);
disconnectAllBlockedClients(); /* Clients blocked in master, now slave. */
disconnectSlaves(); /* Force our slaves to resync with us as well. */
replicationDiscardCachedMaster(); /* Don't try a PSYNC. */
freeReplicationBacklog(); /* Don't allow our chained slaves to PSYNC. */