3.md 6.8 KB

1st Storage Spec Discussion

Author

Jens

Participants

  • Jens
  • Mokhtar
  • Bedeho

Time and place

21th May 2019, 11:30am GMT+1

Topics

  1. Jens had submitted some partial work on storage system spec for Acropolis testnet release, found here.
  2. Mokhtar and Bedeho reviwed and discussed it prior to the meeting, and wrote a review, see below
  3. A meeting was conducted where the review was discussed, and some conclusions were reached on key questions, see conclusion section.

Review

Questions

Progress

Would it be accurate to say that what is missing still to be specified is

  • the communication protocol between uploader and liason
  • the communication protocol between the liason and storage providers
  • the communication protocol between the downloader and the storage provider
  • the runtime protocol used by uploader to add new content
  • the runtime protocol for managing tranches, creating, updating and assigning them to object types
  • the runtime protocol for introducing a new storage provider
  • the runtime protocol for penalizing/evicting a storage provider
  • the runtime protocol for an exiting storage provider

Scope

Perhaps we can limit the scope for Acropolis, yet still achieve much of what we want for our OKRs, by making a good number of things just configured in the genesis state - without thinking of adding/removing/editing, such as

  • data object types
  • tranches
  • other?

Actors module

Should we even depends on Actors module? I never understood it, and Mokhtar has questioned it multiple times.

Mime key map

Data Object Type Registry module has a map DataObjectTypeConstraintsRegistry which maps of media types. Why are media types being emphasized as a special property of constraints? is this just a convenience thing for light clients having to do the client side constraint matching on prospective uploads?

Minor points

Why both DataObjectTypeConstraints and DataObjectTypes

What does DataObjectTypeConstraints add, or vice versa? ultimately, you are trying to establish a mapping between a offchain data filtering rule and an on chain tranche. Prior this all lived in DataObjectTypes. Is the point that DataObjectTypeConstraints don’t point to tranches, only DataObjectTypes do?

Combine modules

The “submodules” 1-3 on the Structure list appear to mutually depend on each other. and nether can sensibly be reused on its own. Therefore organizing them a single Substrate modules seems to be more appropriate.Mind you, we don’t have any good convention on how to apply the Substrate module abstraction, so we just going on case by case basis for now. But re-usability in other runtimes seems like one plausible guiding concern.

Separate Content Directory from Storage Module

The Content Directory (CD) really should be in its own proper Substrate module, which would depend on the storage module. The CD is going to substantially grow in scope as more complicated high level business logic/state is introduced there, and none of it is relied upon by anything else in the storage module proper.

Major points

Payload based model

The model chosen here makes the set of tranches available for a given type/constraint a function of the raw data payload. Moreover, it obliges the client to deduce this.

This has a number of limitations:

A) The functional role of a data object, that is how it is actually going to be used in user applications, may have substantial bearing on how it should be stored and distributed, and this cannot be deduced from the raw payload. The appropriate storage tranche may be sensitive such functional dimensions, and the distribution tranches/system likewise certainly also will be. For example, a certain type of image may be used in a way in the system that its stored in a highly redundant way with highly staked actors. But an equivalent image payload, in every way, may be used in applications in a way that its not critical at all, and it could be stored with tranches with unstaked newcomers with low redundancy.

B) the quota system, for example on uploads, becomes blind to functional roles of objects, and can only operate on raw size and similar metrics, which may not be as flexible as one wants. For example, perhaps you want to make sure that no one store more than one channel cover photo explicitly.

As is, if new members are given a 1GB upload quota for example (which is quite low), they may chose to use that by uploading 1 million 1Kb images for example, which is both abusive and makes no sense.

C) the distribution system will be very sensitive to the role of a data object, as this is major determinant for future bandwidth requirement profile, hence having this explicitly represented will be very useful. <== speculative point, as we have not thought much about distribution.

D) it may be impractical for certain usage environments to even do this , e.g. a browser having process the actual internal encoding information of a large payload.

The alternative implied model by these observations is to have a set of data object types explicitly defined by functional role of the data, and there is only one type open to a given data object. Client applications will have to be hard coded to use the correct type for a given purpose. Likewise the content directory (schematas) would for example require that data objects pointed to have a specific type

Conclusion

  • The progress is more or less captured.

  • The payload-based model for data objects is a result of

    1. not having a clear notion of the purposes of data object types right now, and
    2. both the storage node needing to enforce the constraints, and the UI being able to provide better UX by also trying to enforce constraints
    3. therefore, starting from the file type being presented by the uploader made for a good starting point.
  • The separation of data object types and constraints is to allow for more flexibility in specifying constraints.

  • Actors module should be subsumed into the storage module because it's highly storage specific.

  • Storage should be one module.

  • Content directory should be a separate module.

We did discuss the payload-based model for a while, and came to two changes:

  1. Instead of the constraints as a separate map, create a versioned constraint payload field for data object type. We should not worry about UI having to discover anything related to data objects, and instead expect UI to hardcode data object types based on purpose. Therefore, a constraint payload field allows us to start with simple file types + size approach, and versioning will allow us to expand this into an appropriate DSL in the future.

  2. Starting from hard-coded data object types by purpose does not remove the need for the UI or the storage nodes to enforce constraints.