mirror of
https://github.com/fluencelabs/dweb-transports
synced 2025-03-15 10:30:48 +00:00
Second draft of shim instructions
This commit is contained in:
parent
c30f45c794
commit
22367dc815
101
WRITINGSHIMS.md
101
WRITINGSHIMS.md
@ -1,36 +1,34 @@
|
||||
# Writing Shims to the Internet Archive and to the dweb-transports library
|
||||
|
||||
First draft: Mitra 18 Dec 2018
|
||||
Second draft: Mitra 19 Dec 2018
|
||||
|
||||
Our intention with the dweb-transports and dweb-archive libraries is to be open about integrating with any decentralized platform (what we call a transport),
|
||||
Our intention with the dweb-transports and dweb-archive libraries is to be available for integrating with any decentralized platform (what we call a transport),
|
||||
and this guide is intended to help the process.
|
||||
|
||||
In our experience the process of adding a transport to a platform is pretty easy **when** we collaborate with someone intimately familiar with the platform.
|
||||
So feel free to ask questions in the [dweb-transports](https://github.com/internetarchive/dweb-transports/issues) repo,
|
||||
and to reach out to [Mitra](mitra@archive.org) for assistance.
|
||||
|
||||
If you are working on integration, please add a comment to [dweb-transports#10](https://github.com/internetarchive/dweb-transports/issues/10)
|
||||
If you are working on integration, please add a comment to [dweb-transports issue#10](https://github.com/internetarchive/dweb-transports/issues/10)
|
||||
|
||||
TODO make links to some of the files in other repos, for now dweb-objects for example refers to https://github.com/internetarchive/dweb-objects
|
||||
All the repos are open-source, `dweb-objects` for example refers to [https://github.com/internetarchive/dweb-objects]
|
||||
|
||||
## Overview
|
||||
|
||||
Integrating a Dweb platform (aka Transport) into this library has two main stages.
|
||||
|
||||
1. integration into the dweb-transports repo,
|
||||
1. integration into the [dweb-transports](https://github.com/internetarchive/dweb-transports) repo,
|
||||
which mostly involves writing a file with a name like TransportXYZ.js
|
||||
and integrating in a couple of places. This can be done entirely by a third party,
|
||||
though it will work smoother with collaboration.
|
||||
2. integrating a shim that enables the Internet Archive's content to be available
|
||||
in the decentralized platform either via the dweb.archive.org UI or otherwise.
|
||||
in the decentralized platform either via the [dweb.archive.org](https://dweb.archive.org) UI or otherwise.
|
||||
This is only necessary if you want to make IA content available,
|
||||
and will require our assistance to integrate with code that runs on IA servers.
|
||||
|
||||
Step 2 requires our help as the code has to run on our servers, but
|
||||
Step 1 can be done entirely by the third parties, though in both cases collaboration
|
||||
should make the process go smoother.
|
||||
## Integration into the [dweb-transports](https://github.com/internetarchive/dweb-transports) repo
|
||||
|
||||
## Integration into the Dweb-transports repo
|
||||
### Building TransportXYZ.js
|
||||
|
||||
The main code sits in a file named something like TransportXYZ.js.
|
||||
|
||||
@ -41,21 +39,33 @@ In this file are implementations of:
|
||||
|
||||
See [API.md](./API.md) and the existing code examples for detailed function by function documentation.
|
||||
|
||||
#### Integration other than TransportXYZ.js
|
||||
#### Error handling
|
||||
One common problem with decentralized platforms is reliability. We handle this by falling back from one platform to another,
|
||||
e.g. if IPFS fails we can try WEBTORRENT or HTTP. But this only works if the Transports.js layer can detect when a failure has occurred.
|
||||
This means it is really important to return an error (via a throw, promise rejection, or callback)
|
||||
|
||||
Searching for `SEE-OTHER-ADDTRANSPORT` should find any places in the code where a tweak is required to add a new transport.
|
||||
#### Promises or callbacks
|
||||
We've tried to suport both promises and callbacks, though this isn't complete yet.
|
||||
In general it will work best if each outward facing function supports a `cb(err, res)` parameter, and where this is absent, a Promise is returned that will `resolve` to `res` or `reject` with `err`.
|
||||
|
||||
To assist finding them they include:
|
||||
The `p_foo()` naming convention was previously used to indicate which functions returned a Promise and is gradually being phased out.
|
||||
|
||||
* index.js: needs to require the new TransportXYZ
|
||||
* package.json/dependencies: Should specify which version range of a transport to include
|
||||
* API.md: Has overview documentation
|
||||
* Transports.js: Add a function like: http(), gun() etc: allow finding loaded transports (for example can be used by one transport to find another).
|
||||
* Transports.js: Add to p_connect()
|
||||
### Integration other than TransportXYZ.js
|
||||
|
||||
Searching dweb-transports for `SEE-OTHER-ADDTRANSPORT` should find any places in the code where a tweak is required to add a new transport.
|
||||
|
||||
The current list of places to integrate includes:
|
||||
|
||||
* [index.js](./index.js): needs to require the new TransportXYZ
|
||||
* [package.json/dependencies](./package.json#L13): Should specify which version range of a transport to include
|
||||
* [API.md](./API.md): Has overview documentation
|
||||
* [Transports.js](./Transports.js#L78): Add a function like: http(), gun() etc: allow finding loaded transports (for example can be used by one transport to find another).
|
||||
* [Transports.js/p_connect](./Transports.js#L625): Add to list so it connects by default at startup
|
||||
* [dweb-archive/Util.config](https://github.com/internetarchive/dweb-archive/blob/master/Util.js#L135)
|
||||
|
||||
#### Partial implementation.
|
||||
|
||||
Its perfectly legitimate to only implement the parts of the API that the transport implements,
|
||||
Its perfectly legitimate to only implement the parts of the API that the underlying platform implements,
|
||||
though it will work better if the others are implemented as well,
|
||||
for example:
|
||||
* a list can be implemented on top of a KeyValue system by adding a new item with a key being a timestamp.
|
||||
@ -67,6 +77,10 @@ They aren't currently used by the dweb-archive / dweb.archive.org code.
|
||||
Make sure that `TransportXYZ.js` `constructor()` correctly covers what functions are implemented in the `.supportFunctions` field.
|
||||
This field is used by Transports to see which transports to try for which functionality.
|
||||
|
||||
For example if "store" is listed in TransportXYZ.supportFunctions,
|
||||
then a call to Transports.p_rawstore() will attempt to store using XYZ,
|
||||
and add whatever URL `TransportXYZ.p_rawstore()` returns to the array of URLs where the content is stored.
|
||||
|
||||
## Integration into the Archive's servers.
|
||||
|
||||
Integration into the Archive content will definately require a more in-depth collaboation,
|
||||
@ -75,7 +89,7 @@ but below is an outline.
|
||||
The key challenge is that the Archive has about 50 petabytes of data,
|
||||
and none of the distributed platforms can pratically handle that currently.
|
||||
So we use 'lazy-seeding' techniques to push/pull data into a platform as its requested by users.
|
||||
Optionally we can crawl some subset of resources and pre-seed those to the platform if the process of adding a, possibly large, item is slow (e.g. in IPFS, WEBTORRENT)
|
||||
Optionally, if the process of adding a, possibly large, item is slow (e.g. in IPFS, WEBTORRENT), we can also crawl some subset of Archive resources and pre-seed those files to the platform.
|
||||
|
||||
In all cases, we presume that we run a (potentially) modified peer at the Archive,
|
||||
so that interaction between the Archive servers and the system is fast and bandwidth essentially free.
|
||||
@ -85,7 +99,7 @@ In case its useful .... our servers have:
|
||||
* A persistent volume available to each peer at e.g. /pv/gun
|
||||
* An implementation of REDIS answering on 0.0.0.0:6379 which saves to the persistent volume
|
||||
* A HTTPS or WSS proxy (we prefer this over giving access to dweb.me's certificate to the superpeer)
|
||||
* Log files
|
||||
* Log files (including rotation)
|
||||
* cron (not currently used, but can be)
|
||||
|
||||
These are available to superpeers but will require some liason so we know how they are being used.
|
||||
@ -105,7 +119,7 @@ The actual choices to be made will depend on some of the differences between tra
|
||||
2. Have the server Push data into the platform and share the hash generated by the platform in the metadata (IPFS) and/or pass a URL to the platform which it can pull and return its hash.
|
||||
3. Hybrid - precalculate content addresses during item creation, then hijack the request for the data (this is expensive for the Archive so is going to take a lot longer to setup). (WEBTORRENT)
|
||||
|
||||
Each of these requires a different technique.
|
||||
Each of these requires a different technique, the documentation below currently only covers metadata access for material addressed by name.
|
||||
|
||||
#### 1. Hijacking
|
||||
|
||||
@ -118,22 +132,39 @@ and be able to cache and share it in a decentralized fashion prior to an abiliy
|
||||
|
||||
Obviously this could run quite complex functionality but in may cases simple mapping to URLs on our gateway will work well.
|
||||
|
||||
See `dweb-transport/gun/gun_https_hijackable.js` for the code modification
|
||||
and `gun_https_archive.js` for the configuration that maps `/arc/archive/metadata` to `https://dweb.me/arc/archive.org/metadata/` so that for example
|
||||
`gun:/arc/archive/metadata/commute` retrieves metadata for the `commute` Internet Archive item at `https://dweb.me/arc/archive.org/metadata/commute`.
|
||||
See [dweb-transport/gun/gun_https_hijackable.js](https://github.com/internetarchive/dweb-transport/blob/master/gun/gun_https_hijackable.js) for the code modification
|
||||
and `[gun_https_archive.js](https://github.com/internetarchive/dweb-transport/blob/master/gun/gun_https_archive.js)` for the configuration that maps `/arc/archive/metadata` to `https://dweb.me/arc/archive.org/metadata/` so that for example
|
||||
`gun:/arc/archive/metadata/commute` retrieves metadata for the `commute` Internet Archive item at [https://dweb.me/arc/archive.org/metadata/commute].
|
||||
|
||||
The dweb-archive code needs to know to try Gun for the metadata, and this is configured in dweb-objects/Domain.js/p_setupOnce() which we have to run.
|
||||
This will also work if the address of the table is a hash for example `xyz:/xyz/Q1234567/commute`
|
||||
where `Q1234567` would be `xyz`'s address for the metadata table.
|
||||
The mapping to that table's address can be hard-coded in code, or included in the Domain.js resolution.
|
||||
|
||||
The dweb-archive code needs to know to try Gun for the metadata, and this is configured in [dweb-objects/Domain.js/p_setupOnce()](https://github.com/internetarchive/dweb-objects/blob/master/Domain.js#L404) which we have to run.
|
||||
Note that this configuration mechanism is likely to change in the future though the address (on GUN) checked should remain the same.
|
||||
|
||||
TODO write up integration to files
|
||||
File retrieval can work similarly if the platform allows addressing by name.
|
||||
For example gun:/arc/archive/download could be mapped to https://dweb.me/arc/archive.org/download so that gun:/arc/archive/download/commute/commute.avi
|
||||
would resolve. Similarly the mapping could be to an opaque hash-based address like `xyz:/xyz/Q99999/commute/commute.avi` works.
|
||||
In this case the Archive client would be configured to automatically add a transformed URL like this as one of the places to look for a file.
|
||||
|
||||
#### 2. Push
|
||||
#### 2. Push of URL mapping (prefered) or content.
|
||||
|
||||
This is more complex, and can only integrate files access, not metadata.
|
||||
|
||||
The general path is that a client requests metadata (via HTTP or GUN currently),
|
||||
the dweb-gateway server then passes a URL to the platform (IPFS) which retrieves the URL,
|
||||
calculates its hash (which is a hash of the internal data structure (IPLD)) and passes
|
||||
that to the server. The server incorporates it into the metadata returned.
|
||||
|
||||
It is less preferably to Hijacking, in part because the first metadata query is
|
||||
delayed while the platform is retrieving and processing a potentially large file in order to
|
||||
generate its internal address for it.
|
||||
This is likely to be neccessary if the platform uses content addressing,
|
||||
especially if it uses an internally generated address (for example IPFS uses a multihash of an internal 'IPLD' object).
|
||||
|
||||
This is more complex, and less preferably to Hijacking, and can only integrate files access, not metadata.
|
||||
This is used for IPFS.
|
||||
|
||||
In most cases this will be used
|
||||
|
||||
We will need a HTTP API, and a snippet of code (currently only python is supported) that we can integrate.
|
||||
|
||||
It should have a signature like:
|
||||
@ -147,7 +178,8 @@ def store(self, data=None, # If passed, this data will be pushed
|
||||
```
|
||||
and should return a string that is the URL to be used for access, e.g. `ipfs:/ipfs/Q12345`
|
||||
|
||||
We'll need to integrate it into `dweb-gateway` in `Archive.py.item2thumbnail()` and `NameResolve.py/cache_content()`
|
||||
We'll need to integrate it into `dweb-gateway` in [Archive.py.item2thumbnail()](https://github.com/internetarchive/dweb-objects/blob/master/Archive.py#L360]
|
||||
and [NameResolver.py/cache_content()](https://github.com/internetarchive/dweb-objects/blob/master/NameResolver.py#L222)
|
||||
|
||||
#### 3. Hybrid - Precalculate + hijack.
|
||||
|
||||
@ -174,7 +206,8 @@ either a separate repo or a pull request on dweb-transport where you can take ov
|
||||
|
||||
### Installation for production integration
|
||||
|
||||
We'll then need some info to help us integrate in our Docker production system which will include:
|
||||
We'll then need some info to help us integrate in our Docker/Kubernates production system.
|
||||
Sorry, but this isn't currently in an open repo since its tied into our CI system. The content will include:
|
||||
|
||||
* Any one-time instructions to run in `superv`.
|
||||
Note these are run each time a dockerfile starts so need to be safe to run multiple times e.g.
|
||||
@ -222,7 +255,7 @@ this should be pretty standard for NodeJS, Python3 or Go applications, and you c
|
||||
COPY etc /etc/
|
||||
RUN mkdir -p /var/log/dweb
|
||||
```
|
||||
Typically your code would then look something like the following nodejs example (Go and Python3 examples on request)
|
||||
Typically your code for integrating into Docker would then look something like the following nodejs example (Go and Python3 examples on request)
|
||||
```
|
||||
RUN apt-get -y install anyOtherPackagesYouNeed
|
||||
RUN cd /usr/local && git clone https://github.com/<yourRepo || internetarchive/dweb-transport> \
|
||||
|
Loading…
x
Reference in New Issue
Block a user