29
loading...
This website collects cookies to deliver better user experience
PliantDb
asked a question on our Discord server:The current version is not yet at 1.0, and messages are everywhere to not use it, yet. What features aren't yet implemented or trusted?
In terms of what's trusted: everything. I feel confident in this implementation because of the code coverage: https://pliantdb.dev/coverage -- It's not perfect, and I'm sure there are some bugs, but the biggest concern to me is storage formats. I may replace cbor with something else, for many reasons that I'll leave outside of chat here (Dax doesn't even know that thought process yet lol). This sort of fundamental storage change would make a simple update incompatible, and that's what I'm not ready for people to adopt PliantDb aggressively yet.
That being said, part of the unit tests do include testing backup/restore, and my intention is to ensure that export format will always be able to bring you from a previous version to a current version in those situations. The gotcha right now for that feature is that the key-value store isn't backed up currently. https://github.com/khonsulabs/pliantdb/issues/50 (Discovered I overlooked that feature while hooking up those unit tests).
Missing features that I'm aware of for local/embedded use: Collection's don't have a List function. You can list by creating a view over the collection, but I need to add a separate List endpoint. I started this the other day, but I was hoping to do it by replacing get_multiple. I realized that approach was a bad idea from a permissions standpoint, so I reverted the changes to tackle it another day.
For server/client: There isn't any multi-user support (yet). We're on the cusp of it. The certificate handling on the server portion for the QUIC protocol currently only supports pinned certificates -- the goal is for our HTTP + QUIC layers to eventually share the same certificate. For websockets, no TLS currently, and the websockets are mounted at root. Eventually they will be moved to a route on an HTTP layer that you will be able to extend with your own HTTP routes.
PliantDb
from memory exhaustion attacks. I knew bincode
's method, but my initial searches on mitigation strategies for serde-cbor
came up blank.serde-cbor
crate should be considered the mainline one, or if a newer one (Ciborium
) should replace it. I should note, I haven't tested either crate against this attack, and it could be that one or both of them already mitigate it somehow. And, if either are susceptible, pull requests could address the issue. But, I wasn't sure where my efforts to further investigate should be spent.PliantDb
that need to be serialized and deserialized: ones PliantDb
itself manages, and ones that users of PliantDb
will provide. This is where the power of serde
comes in: PliantDb
only needs the user types ot implement Serialize
and Deserialize
, and it's able to be easily stored in PliantDb
.bincode
has a note in its README discussing its limitations of using this type of format for storage.PliantDb
to be easy to use in a reliable fashion, user datatypes should be enoded using a self-describing format. With CouchDB, a major inspiration for PliantDb
, documents were stored as JSON. However, JSON isn't a particularly efficient format, and in my research, CBOR
is an open-standard binary format with a reasonable amount of popularity in the Rust community.PliantDb
structures, I am willing to subject myself to limitations on how to manage migrating between versions of data structures. Those structures I want to serialize as quickly as possible while still providing me some flexibility. bincode
fits this bill perfectly. While a custom format technically could be faster, bincode
is very fast and well-tested.CBOR
and bincode
. But, something rubbed me the wrong way about CBOR
and most other self-describing formats. This friction of wanting to solve the only outstanding question for the storage of PliantDb
's documents made me confront one of my only dislikes of the CBOR
format: its verbosity.struct Logs {
date: Date,
entries: Vec<LogEntry>,
}
struct LogEntry {
timestamp: DateTime<Utc>,
level: String,
message: String,
// ...
}
entries
, the identifiers timestamp
, level
, and message
will be in the created file that many times."hello"
in your executable in 30 files, the compiler will encode the same address for each reference.CBOR
.PliantDb Binary Object Representation
, or PBOR
. While I named it after CBOR
, I genuinely came up with this format independently, and while it bears a resemblance, there are a few distinct features. First, let me state my goals explicitly upfront for this project:serde
's features. Essentially, design it to fit serde
's design like a glove.CBOR
.CBOR
.kind
and an optional argument
. This turns out to be another way that CBOR
and my format differ. For CBOR
, the argument is always output as a second byte (or additional, depending on how big the integer value is). The way I tackled the problem requires slightly more work but appears to over-time save storage space.PBOR
an atom is an individual chunk of data. The first byte contains three pieces of information:& 0b11110000
): the Atom kind.& 0b1000
): Additional bytes are part of the argument& 0b111
): the first 3 bits of the argument.u64
, which makes the maximum atom header weigh in at 10 bytes with this extra encoding.PBOR
there is an atom kind Symbol. When the serializer first encounters a new identifier, it will write an atom (Symbol, 0)
, followed by a string atom containing the identifier. The deserializer will expect a string when it receives an 0 in the atom header. Both the serializer and deserializer will assign it a new id, with the first one starting at 1 and counting upwards.Library | Serialize (ms) | Deserialize (ms) | length | gzip length |
---|---|---|---|---|
bincode | 0.5757 | 2.3022 | 741,295 | 305,030 |
pbor | 2.1235 | 4.7786 | 983,437 | 373,654 |
serde-cbor | 1.4557 | 4.7311 | 1,407,835 | 407,372 |
serde-json | 3.2774 | 6.0356 | 1,827,461 | 474,358 |
PBOR
is not a clear winner on any given metric, but it did achieve my primary goals.CBOR
?PliantDb
, we must consider how data flows through the database.PBOR
is an interesting option, but there are significant benefits to using an open standard like CBOR
. I don't believe either choice will significantly affect the performance of PliantDb
servers. Finishing up PBOR
would require several more days to flush out unit testing and benchmarks and a few rough edges.CBOR
PBOR
sounds worth pursuing furtherPliantDb
shouldn't have one enabled by default, and users should be able to pick via feature flags. Clients and servers should be able to support multiple formats at the same time.