Modules: native types doc, 70% done.

2025-05-07 08:22:13 +00:00 · 2016-06-04 12:54:18 +02:00 · 2016-06-04 12:54:18 +02:00 · c3f5b6ebf9
commit c3f5b6ebf9
parent 5830d8821b
1 changed files with 305 additions and 0 deletions
--- a/src/modules/TYPES.md
+++ b/src/modules/TYPES.md
@ -0,0 +1,305 @@
+Native types in Redis modules
+===
+
+Redis modules can access Redis built-in data structures both at high level,
+by calling Redis commands, and at low level, by manipulating the data structures
+directly.
+
+By using these capabilities in order to build new abstractions on top of existing
+Redis data structures, or by using strings DMA in order to encode modules
+data structures into Redis strings, it is possible to create modules that
+*feel like* they are exporting new data types. However, for more complex
+problems, this is not enough, and the implementation of new data structures
+inside the module is needed.
+
+We call the ability of Redis modules to implement new data structures that
+feel like native Redis ones **native types support**. This document describes
+the API exported by the Redis modules system in order to create new data
+structures and handle the serialization in RDB files, the rewriting process
+in AOF, the type reporting via the `TYPE` command, and so forth.
+
+Overview of native types
+---
+
+A module exporting a native type is composed of the following main parts:
+
+* The implementation of some kind of new data structure and of commands operating on the new data structure.
+* A set of callbacks that handle: RDB saving, RDB loading, AOF rewriting, releasing of a value associated with a key, calculation of a value digest (hash) to be used with the `DEBUG DIGEST` command.
+* A 9 characters name that is unique to each module native data type.
+* An encoding version, used to persist into RDB files a module-specific data version, so that a module will be able to load older representations from RDB files.
+
+While to handle RDB loading, saving and AOF rewriting may look complex as a first glance, the modules API provide very high level function for handling all this, without requiring the user to handle read/write errors, so in practical terms, writing a new data structure for Redis is a simple task.
+
+A **very easy** to understand but complete example of native type implementation
+is available inside the Redis distribution in the `/modules/hellotype.c` file.
+The reader is encouraged to read the documentation by looking at this example
+implementation to see how things are applied in the practice.
+
+Registering a new data type
+===
+
+In order to register a new native type into the Redis core, the module needs
+to declare a global variable that will hold a reference to the data type.
+The API to register the data type will return a data type reference that will
+be stored in the global variable.
+
+    static RedisModuleType *MyType;
+    #define MYTYPE_ENCODING_VERSION 0
+
+    int RedisModule_OnLoad(RedisModuleCtx *ctx) {
+        MyType = RedisModule_CreateDataType("MyType-AZ", MYTYPE_ENCODING_VERSION,
+            MyTypeRDBLoad, MyTypeRDBSave, MyTypeAOFRewrite, MyTypeDigest,
+            MyTypeFree);
+        if (MyType == NULL) return REDISMODULE_ERR;
+    }
+
+As you can see from the example above, a single API call is needed in order to
+register the new type. However a number of function pointers are passed as
+arguments. The prototype of `RedisModule_CreateDataType` is the following:
+
+    moduleType *RedisModule_CreateDataType(RedisModuleCtx *ctx,
+               const char *name, int encver,
+               moduleTypeLoadFunc rdb_load,
+               moduleTypeSaveFunc rdb_save,
+               moduleTypeRewriteFunc aof_rewrite,
+               moduleTypeDigestFunc digest,
+               moduleTypeFreeFunc free);
+
+The `ctx` argument is the context that we receive in the `OnLoad` function.
+The type `name` is a 9 character name in the character set that includes
+from `A-Z`, `a-z`, `0-9`, plus the underscore `_` and minus `-` characters.
+
+Note that **this name must be unique** for each data type in the Redis
+ecosystem, so be creative, use both lower-case and upper case if it makes
+sense, and try to use the convention of mixing the type name with the name
+of the author of the module, to create a 9 character unique name.
+
+For example if I'm building a *b-tree* data structure and my name is *antirez*
+I'll call my type **btree1-az**. The name, converted to a 64 bit integer,
+is stored inside the RDB file when saving the type, and will be used when the
+RDB data is loaded in order to resolve what module can load the data. If Redis
+finds no matching module, the integer is converted back to a name in order to
+provide some clue to the user about what module is missing in order to load
+the data.
+
+The type name is also used as a reply for the `TYPE` command when called
+with a key holding the registered type.
+
+The `encver` argument is the encoding version used by the module to store data
+inside the RDB file. For example I can start with an encoding version of 0,
+but later when I release version 2.0 of my module, I can switch encoding to
+something better. The new module will register with an encoding version of 1,
+so when it saves new RDB files, the new version will be stored on disk. However
+when loading RDB files, the module `rdb_load` method will be called even if
+there is data found for a different encoding version (and the encoding version
+is passed as argument to `rdb_load`), so that the module can still load old
+RDB files.
+
+The remaining arguments `rdb_load`, `rdb_save`, `aof_rewrite`, `digest` and
+`free` are all callbacks with the following prototypes and uses:
+
+    typedef void *(*RedisModuleTypeLoadFunc)(RedisModuleIO *rdb, int encver);
+    typedef void (*RedisModuleTypeSaveFunc)(RedisModuleIO *rdb, void *value);
+    typedef void (*RedisModuleTypeRewriteFunc)(RedisModuleIO *aof, RedisModuleString *key, void *value);
+    typedef void (*RedisModuleTypeDigestFunc)(RedisModuleDigest *digest, void *value);
+    typedef void (*RedisModuleTypeFreeFunc)(void *value);
+
+* `rdb_load` is called when loading data from the RDB file. It loads data in the same format as `rdb_save` produces.
+* `rdb_save` is called when saving data to the RDB file.
+* `aof_rewrite` is called when the AOF is being rewritten, and the module needs to tell Redis what is the sequence of commands to recreate the content of a given key.
+* `digest` is called when `DEBUG DIGEST` is executed and a key holding this module type is found. Currently this is not yet implemented so the function ca be left empty.
+* `free` is called when a key with the module native type is deleted via `DEL` or in any other mean, in order to let the module reclaim the memory associated with such a value.
+
+Setting and getting keys
+---
+
+After registering our new data type in the `RedisModule_OnLoad()` function,
+we also need to be able to set Redis keys having as value our native type.
+
+This normally happens in the context of commands that write data to a key.
+The native types API allow to set and get keys to module native data types,
+and to test if a given key is already associated to a value of a specific data
+type.
+
+The API uses the normal modules `RedisModule_OpenKey()` low level key access
+interface in order to deal with this. This is an eaxmple of setting a
+native type private data structure to a Redis key:
+
+    RedisModuleKey *key = RedisModule_OpenKey(ctx,keyname,REDISMODULE_WRITE);
+    struct some_private_struct *data = createMyDataStructure();
+    RedisModule_ModuleTypeSetValue(key,MyType,data);
+
+The function `RedisModule_ModuleTypeSetValue()` is used with a key handle open
+for writing, and gets three arguments: the key handle, the reference to the
+native type, as obtained during the type registration, and finally a `void*`
+pointer that contains the private data implementing the module native type.
+
+Note that Redis has no clues at all about what your data contains. It will
+just call the callbacks you provided during the method registration in order
+to perform operations on the type.
+
+Similarly we can retrieve the private data from a key using this function:
+
+    struct some_private_struct *data;
+    data = RedisModule_ModuleTypeGetValue(key);
+
+We can also test for a key to have our native type as value:
+
+    if (RedisModule_ModuleTypeGetType(key) == MyType) {
+        /* ... do something ... */
+    }
+
+However for the calls to do the right thing, we need to check if the key
+is empty, if it contains a value of the right kind, and so forth. So
+the idiomatic code to implement a command writing to our native type
+is along these lines:
+
+    RedisModuleKey *key = RedisModule_OpenKey(ctx,argv[1],
+        REDISMODULE_READ|REDISMODULE_WRITE);
+    int type = RedisModule_KeyType(key);
+    if (type != REDISMODULE_KEYTYPE_EMPTY &&
+        RedisModule_ModuleTypeGetType(key) != MyType)
+    {
+        return RedisModule_ReplyWithError(ctx,REDISMODULE_ERRORMSG_WRONGTYPE);
+    }
+
+Then if we successfully verified the key is not of the wrong type, and
+we are going to write to it, we usually want to create a new data structure if
+the key is empty, or retrieve the reference to the value associated to the
+key if there is already one:
+
+    /* Create an empty value object if the key is currently empty. */
+    struct some_private_struct *data;
+    if (type == REDISMODULE_KEYTYPE_EMPTY) {
+        data = createMyDataStructure();
+        RedisModule_ModuleTypeSetValue(key,MyTyke,data);
+    } else {
+        data = RedisModule_ModuleTypeGetValue(key);
+    }
+    /* Do something with 'data'... */
+
+Free method
+---
+
+As already mentioned, when Redis needs to free a key holding a native type
+value, it needs help from the module in order to release the memory. This
+is the reason why we pass a `free` callback during the type registration:
+
+    typedef void (*RedisModuleTypeFreeFunc)(void *value);
+
+A trivial implementation of the free method can be something like this,
+assuming our data structure is composed of a single allocation:
+
+    void MyTypeFreeCallback(void *value) {
+        RedisModule_Free(value);
+    }
+
+However a more real world one will call some function that performs a more
+complex memory reclaiming, by casting the void pointer to some structure
+and freeing all the resources composing the value.
+
+RDB load and save methods
+---
+
+The RDB saving and loading callbacks need to create (and load back) a
+representation of the data type on disk. Redis offers an high level API
+that can automatically store inside the RDB file the following types:
+
+* Unsigned 64 bit integers.
+* Signed 64 bit integers.
+* Doubles.
+* Strings.
+
+It is up to the module to find a viable representation using the above base
+types. However note that while the integer and double values are stored
+and loaded in an architecture and *endianess* agnostic way, if you use
+the raw string saving API to, for example, save a structure on disk, you
+have to care those details yourself.
+
+This is the list of functions performing RDB saving and loading:
+
+    void RedisModule_SaveUnsigned(RedisModuleIO *io, uint64_t value);
+    uint64_t RedisModule_LoadUnsigned(RedisModuleIO *io);
+    void RedisModule_SaveSigned(RedisModuleIO *io, int64_t value);
+    int64_t RedisModule_LoadSigned(RedisModuleIO *io);
+    void RedisModule_SaveString(RedisModuleIO *io, RedisModuleString *s);
+    void RedisModule_SaveStringBuffer(RedisModuleIO *io, const char *str, size_t len);
+    RedisModuleString *RedisModule_LoadString(RedisModuleIO *io);
+    char *RedisModule_LoadStringBuffer(RedisModuleIO *io, size_t *lenptr);
+    void RedisModule_SaveDouble(RedisModuleIO *io, double value);
+    double RedisModule_LoadDouble(RedisModuleIO *io);
+
+The functions don't require any error checking from the module, that can
+always assume calls succeed.
+
+As an example, imagine I've a native type that implements an array of
+double values, with the following structure:
+
+    struct double_array {
+        size_t count;
+        double *values;
+    };
+
+My `rdb_save` method may look like the following:
+
+    void DoubleArrayRDBSave(RedisModuleIO *io, void *ptr) {
+        struct dobule_array *da = ptr;
+        RedisModule_SaveUnsigned(io,da->count);
+        for (size_t j = 0; j < da->count; j++)
+            RedisModule_SaveDouble(io,da->values[j]);
+    }
+
+What we did was to store the number of elements followed by each double
+value. So when later we'll have to load the structure in the `rdb_load`
+method we'll do something like this:
+
+    void *DoubleArrayRDBLoad(RedisModuleIO *io, int encver) {
+        if (encver != DOUBLE_ARRAY_ENC_VER) {
+            /* We should actually log an error here, or try to implement
+               the ability to load older versions of our data structure. */
+            return NULL;
+        }
+
+        struct double_array *da;
+        da = RedisModule_Alloc(sizeof(*da));
+        da->count = RedisModule_LoadUnsigned(io);
+        da->values = RedisModule_Alloc(da->count * sizeof(double));
+        for (size_t j = 0; j < da->count; j++)
+            da->values = RedisModule_LoadDouble(io);
+        return da;
+    }
+
+The load callback just reconstruct back the data structure from the data
+we stored in the RDB file.
+
+Note that while there is no error handling on the API that writes and reads
+from disk, still the load callback can return NULL on errors in case what
+it reads does not look correct. Redis will just panic in that case.
+
+AOF rewriting
+---
+
+    void RedisModule_EmitAOF(RedisModuleIO *io, const char *cmdname, const char *fmt, ...);
+
+Handling multiple encodings
+---
+
+    WORK IN PROGRESS
+
+Allocating memory
+---
+
+Modules data types should try to use `RedisModule_Alloc()` functions family
+in order to allocate, reallocate and release heap memory used to implement the native data structures (see the other Redis Modules documentation for detailed information).
+
+This is not just useful in order for Redis to be able to account for the memory used by the module, but there are also more advantages:
+
+* Redis uses the `jemalloc` allcator, that often prevents fragmentation problems that could be caused by using the libc allocator.
+* When loading strings from the RDB file, the native types API is able to return strings allocated directly with `RedisModule_Alloc()`, so that the module can directly link this memory into the data structure representation, avoiding an useless copy of the data.
+
+Even if you are using external libraries implementing your data structures, the
+allocation functions provided by the module API is exactly compatible with
+`malloc()`, `realloc()`, `free()` and `strdup()`, so converting the libraries
+in order to use these functions should be trivial.
+
+