Submit templates for repl.it/templates here.

← Back to all posts
Repl DB
h
ApoorvSingal (43)

repl db

A full fledged file based database management system with json like data storage format. Its asynchronous, can handle multiple connections and actions at once, is super fast and uses a dedicated repl for managing database.

The contents of this page are divided into two parts: the api reference and the implementation guide (below api reference).

NOTE #1: A lot (really lot) of things are undocumented here. Since this post was getting too big, I decided to make a website for the docs (https://repldb.repl.co) but before I could complete it, my tablet (the only device I had) got damaged beyond repair, (I dropped it) and now I can't code (probably for this whole year) cuz no money (crying).

NOTE #2: @Lord_Poseidon also made the repldb server in golang (https://repl.it/@Lord_Poseidon/FantasticThirdHashmap) which can be used with the same repldb client api and is actually faster than the nodejs server I made when handling small number of actions while the nodejs server is better at handling large number of actions simultaneously. Here is the performance comparison between the two servers, https://repl.it/@ApoorvSingal/db-server-performance-comparison . You can use this comparison to choose the right server for you. The go server doesn't have error handling and querying yet. And yes, both of them are way faster than firestore and mongodb.

NOTE #3: Repldb can be made many times faster than now by exploiting the fact that replit gives more ram than file storage to repls, so we can always keep the whole db in ram without worrying about heap overflow, making the reading from db many many times faster. Also, this allows us to use resource intensive compression and encryption algorithms because we don't have to worry about the performance loss caused by them. But as mentioned in note #1, I can't work on all this right now.

API Reference

UPDATE #1: major change in doc.set() functionality, read docs below for info.
UPDATE #2: major change in doc.get() functionality, read docs below for info.
UPDATE #3: introducing database snapshots 🔥
UPDATE #4: added beautiful error messages for command execution failures.
UPDATE #5: introducing querying and multithreading (undocumented).

Contents

  • class DB

    • constructor(serverUrl)
    • db.init(key, [len])
    • db.list()
    • db.createCollection(name)
    • db.createSnap(name)
    • db.collection(name)
    • db.doc(name)
  • class Collection

    • collection.parent
    • collection.exists()
    • collection.list()
    • collection.createCollection(name)
    • collection.collection(name)
    • collection.doc(name)
    • collection.delete()
  • class Doc

    • doc.parent
    • doc.exists()
    • doc.set(content)
    • doc.set(content, [preserveRest])
    • doc.get()
    • doc.get([props])
    • doc.delete()
  • class Snap extends Collection

    • snap.save(target)

DB

  • constructor(serverUrl)

    • serverUrl <string> url of the database server (do not include the protocol in url)
    • Returns <DB>
  • db.init(key, [len])

    • key <string> a secret key used for authentication.
    • len <integer> (default = 5) length of command IDs (you can ignore this in most cases, more detailed info is given below).
    • Returns: <Promise> (resolves to undefined).

Performs authentication and a quick handshake with the database server. It is necessary to call db.init() before using the db.

const db = new DB;
await db.init(process.env.DB_KEY);
// use db here
// await db.collection("users").doc("Kakashi").set({ age: 173 });

  • db.list()
    • Returns: <Promise> (resolves to object[]).

Lists all the collections and docs of the db. The objects in the returned array are of format { name: string, type: string<"doc" | "collection" }.

const db = new DB(url);

db.list().then(stuff => {
  stuff.forEach(async child => {
    if(child.type == "doc"){
      console.log("Doc:", child.name);
      
      const doc = db.doc(child.name);
      console.log(child.name+"'s data:", await doc.get());
    }
    else {
      console.log("Collection:", child.name);
    }
  });
});

  • db.createCollection(name)
    • name <string> name of the collection.
    • Returns: <Promise> (resolves to the newly made collection).

Creates a new collection with name name.


  • db.createSnap(name)
    • name <string> name of the collection.
    • Returns: <Promise> (resolves to the newly made snap).

Creates a new empty snapshot of the database with name name.


  • db.collection(name)
    • name <string> name of the collection.
    • Returns <Collection>

Returns an already existing collection. It does not check whether the collection exists or not, you can do it later with collection.exists().

let collec = db.collection("hehe");

if(!(await collec.exits())){
  collec = db.createCollection("hehe");
}
// do stuff with collection
console.log(await collec.doc("Kaka").get());

  • db.doc(name)
    • name <string> name of the doc.
    • Returns <Collection>

Returns a document. Just like db.collection(name) it does not check whether the document exists or not, you can do it later with doc.exists()

let doc = db.doc("hehe");

if(!(await doc.exits())){
  doc.set({...defaultStuff});
}
// do stuff with doc
console.log(await doc.get());

Collection

  • collection.parent <DB> | <Collection>

Gives the parent of the collection.

  • collection.exists()
    • Returns: <Promise> (resolves to either true or false).

Checks whether the collection exists or not.


  • collection.list()
    • Returns: <Promise> (resolves to object[]).

Lists all the collections and docs of the db. The objects in the returned array are of format { name: string, type: string<"doc" | "collection" }. Same as db.list().


  • collection.createCollection(name)
    • Returns: <Promise> (resolves to newly made collection).

Creates a child collection inside collection.


  • collection.collection(name)
    • Returns: <Collection>

Same as db.collection(name) but returns child collection of collection.


  • collection.doc(name)
    • name <string> name of the doc.
    • Returns: <Doc>

Same as db.doc(name) but returns child doc of collection


  • collection.delete()
    • Returns: <Promise> (resolves to undefined)

Deletes collection.


Doc

  • doc.parent <DB> | <Collection>

Gives the parent of the doc.


  • doc.exists()
    • Returns: <Promise> (resolves to true or false)

Checks whether the document exists or not.


  • doc.set(content)

    • -content <object> the content for the doc.-
    • Returns: <Promise> (resolves to undefined)

Sets the document's data to content.


  • doc.set(content, preserveRest)

    • content <object> the content for the doc.
    • preserveRest <boolean> (default = false) whether to preserve the propert
    • Returns: <Promise> (resolves to undefined)

Sets the document's data to content.

const doc = db.doc("doc");

await doc.set({ a: 1, b: 2 });
// prints { a: 1, b: 2 }
console.log(await doc.get());

await doc.set({ a: 1, c: 3 });
// prints { a: 1, c: 3 }
console.log(await doc.get());

await doc.set({ b: 2 }, true);
// prints { a: 1, b: 2, c: 3 }
console.log(await doc.get());

// Earlier the preserveRest functionality could be achieved by this
const oldData = await doc.get();
doc.set({ ...oldData, ...newData });

// But now, the new update doesn't just make it more readable and easier to use but also is more than twice as fast.
doc.set(newData, true);

  • doc.get()

    • Returns: <Promise> (resolves to <object>)

Gives the content of the document.


  • doc.get([props])

    • props <string[]> (optional) name of document properties to get.
    • Returns: <Promise> (resolves to <objcet>

Gives the values for props in the doc. If props is not specified, it returns the full content of the document.

const doc = collection.doc("doc");
await doc.set({ name: "Kakashi", age: 174 });

// prints { name: 'Kakashi', age: 174 }
console.log(await doc.get());

// prints { name: 'Kakashi' }
console.log(await doc.get(["name"]));

Why this update?

Suppose you had a really big doc like this one,

{ 
  name: "Kakashi",
  age: 174,
  accountCreatedAt: "20 April 2020",
  dateOfBrith: "10 April 1846",
  password: "a long big hash",
  ....and 100 more fields
}

If you ever needed the data of this doc, you could call doc.get() which would fetch the whole doc and return its content.
Suppose a part of your app app only required name and age properties. To get those, you would do something like this,

const { name, age } = await doc.get();

This may seem okay, but actually it fetches all the contents of your really long document, deserializes it, and returns the deserialized data, you make a reference for name and age and whenever the gc runs next time, it deletes all the other unnecessary stuff.

So, this would waste cpu time and memory in bringing the contents through websockets, deserialising the data, storing the data, and cleaning the unnecessary data.

But now, with the new update, the above code will change to something like this,

const { name, age } = await doc.get(["name", "age"]);

This time, out of your massive document, only name and age properties are fetched and all the performance issues mentioned above are perfectly solved.

You can fetch the whole document by not providing the props argument to the function, and therefore the new update also doesn't break old api implementations.

Along with the update in doc.set(), now the dbms is capable of handling really huge documents with no performance issues.

Although, its still not recommended to have huge documents. If you think you can break the contents of a large document into two, please do it, because even if your main application has no performance issues, the database server still needs to serialise and deserialise complete documents.


  • doc.delete()

    • Returns: <Promise> (resolves to undefined)

Deletes the document.

const doc = collec.doc("doc");
await doc.delete();

// prints false
console.log(await doc.exists());
// throws error
console.log(await doc.get());

// works
doc.set({ name: "Kakashi", age: "174" });

// prints true
console.log(await doc.exists());
// prints { name: 'Kakashi', age: "174" }
console.log(await doc.get());

Snap

The detailed description of snapshots is given in implementation guide below.

  • snap.save(target)

    • target <Collection | Doc | DB> collection or document to save

Save the contents of target to the snapshot. If db is passed as argument, the whole database is saved in the snapshot.

Implementation Guide

How to implement?

  • Fork the database repl, change the KEY variable in .env to any long secret string and run the repl.
  • Fork the client repl, import db.js, and start messing with the db. The db.js exports 3 classes (DB, Doc and Collection) which are documented above.
    If you want to implement the client code in any other language, you don't need to do this step, read the below section and you can make a client api yourself.

How it works?

First of all, the dbms uses a separate repl for maintaining the db, because,

  • it reduces work load on the main application.
  • replit editor doesn't work very well with huge number of files.

A websocket server is hosted on the database repl and the client library communicates with the server for manipulating the database.

Database Structure

The database is a directory called _db inside the repl's home directory, each collection is a sub directory of _db, every sub collection is a sub directory of its parent collection and every document is a file with BSON encoded data.

The database server treats the database exactly like a collection, therefore, all the collection commands are valid for the database as well, although, some of them are undocumented in above api reference because they are never needed and their usage is totally not recommended.

For example, calling db.delete() deletes the database i.e. the _db directory and the database can no longer be used, even if you reconnect and re-perform the handshake (db.init()). The db.init() does not create the database, it only performs the authentication and session management.
The database is already created in the server repl and there is no standard way to recreate the db if you ever delete it.

Although, there is a somewhat hacky way to recreate a deleted database,

const db = new DB("server url"); // deleted db

await db.init(key); // creates websocket connection

db.path = "";
db.createCollection("_db"); // creates the "_db" directory

db.path = "_db";

// use db normally here

The db.path here is another undocumented property of the DB class which represents the path of database relative to the home directory, the default value of db.path is "_db" and there is never a need to change this.
Collections and documents also have the collection.path and doc.path properties which represent their path relative to the home directory. Altering their values is totally not good, never touch these properties.

The dbms uses BSON format for storing data (which is also used by MongoDB);

Database Snapshots

The snapshot of a database represents the state of database of the time when the snapshot was created. A snapshot can save the whole database or few specific collections or documents depending on your purpose of creating the snapshot.
You can use snapshots for having backup of your database in case your app accidentally deletes something important because of a bug or any other reason, or you can use snapshots for storing the history of the database etc.

Just like the database, the snapshots are also treated as ordinary collections by the server, therfore all collection commands are valid for snapshots as well.

The snapshots are stored in the _snaps sub directory of the home directory of container. The path to a snapshot is _snaps/[snapshot name] and the path to a collection/doc inside a snap is _snaps/[snapshot name]/[parent collection name]/.../[collection name | doc name]

These collections and docs can be used exactly like the ordinary ones but its not recommended to edit them directly since it defies the whole purpose of snapshots.

If you save a document, its parent collections are recursively created but only the given document is saved.

Here is an example to explain the workings of snapshots,

const db = new DB("server");

const collec = await db.createCollec("collec");
const doc = collec.doc("doc");

await doc.set({ a: 1, b: 2 });

const snap = db.createSnap("snap");

await snap.save(doc); // recursively creates "collec" collection.

const snapDoc = snap.collection("collec").doc("doc");

// prints { a: 1, b: 2 }
console.log(await snapDoc.get());

await doc.set({ c: 3 }, true);

// prints { a: 1, b: 2, c: 3 }
console.log(await doc.get());
// prints { a: 1, b: 2 }
console.log(await snapDoc.get());

await snap.save(doc);
// prints { a: 1, b: 2, c: 3 }
console.log(await snapDoc.get());

Working with the server

Handshake

When the client library connects to the database server, the library is needed to send a secret key for authentication and the length of command IDs (we'll talk about command IDs in a sec, fow now assume it to be just a number).

Here is the authentication code on server,

wss.on("connection", (ws) => {
  
  ws.once("message", (message) => {
    
    const data = BSON.deserialize(message);
  
    if(data.key == process.env.KEY){
      len = data.len;
      ws.on("message", msg => handleCommand(ws, msg.toString("ascii")));
      ws.send("0");
    }
    else {
      ws.send("BUH!");
      ws.terminate();
    }
  });
});

Isn't it beautiful? No hard coding at all and yet its perfectly secure and fast.

As you saw in the above code, the client sends BSON encoded { key: [some string], len: [some number] } payload.
If the key matches KEY environment variable, authentication is successful, the len property is stored for future use and the client can now send commands to the server (we'll talk about commands later in this section).
If the key sent by client does not match KEY environment variable, the socket is terminated.

This is the small handshake process at the beginning of websocket connection and is handled by db.init(key, [len]) method which is documented above in api reference of DB class.


Database Commands

The client can send 8 different kinds of instructions to the database server, each kind of instruction along with necessary input is called a command.
A command is a simple buffer of format: [command id][command index][command input]


Command ID

A command id is an unique id for each command and is generated by the client library. It is used to keep the data of different commands separate when two or more commands are being executed at the same time, because all the data flows through the same websocket.
The length of command id is always fixed and is sent to the database server by the client at the time of handshake. The length of command ids cannot be changed after the handshake.
Here is the super high quality uuid generator used by the our js api,

genKey(len){
  return Math.random().toString(30).substr(2, len); // the default value for len is 5
}

Yes, the command id is not some super unique permanent kind of thing. Once a command is completed, the same id can be used to represent any other command.

Even if your app always has 100 database commands under execution, the probability of two ids being same is 0.000411522633744855% (assuming id length to be 5) and ofc if you don't like this number, you can change the id length to anything in db.init() as documented above in api reference.


Command Index

As its mentioned above, there are total 7 different commands. Each command has a index starting from 0 to 7. Here is the list of all commands mapped with their index.

  • 0 => create collection (implemented by db.createCollection() and collection.createCollection())
  • 0 => create collection (implemented by db.createCollection(), collection.createCollection() and db.createSnap())
  • 1 => list collections (implemented by db.list() and collection.list())
  • 2 => delete collection (implemented by collection.delete())
  • 3 => check whether collection/doc exists (implemented by collection.exists() and doc.exists())
  • 4 => set document content (implemented by doc.set())
  • 5 => get document content (implemented by doc.get())
  • 6 => delete document (implemented by doc.delete())
  • 7 => saves a collection/doc to the snapshot (implemented by snap.save())

Command Input

Some commands require input to work. Here is the index to input mapping of commands.

  • 0 => path to collection
  • 1 => path to collection
  • 2 => path to collection
  • 3 => path to collection/doc
  • 4 => BSON encoded { path: [path to document], data: [BSON encoded object] }
  • 4 => BSON encoded { path: [path to document], preserve: [true | false], data: [BSON encoded object] } // read doc.set() api docs for more info
  • 5 => path to doc
  • 5 => BSON encoded { path: [path to document], props: [array of properties to fetch | undefined] } // read doc.get() api docs for more info
  • 6 => path to doc
  • 7 => BSON encoded { snap: [path to snap], target: [path to doc/collection/db] }

The paths of documents and collections are explained above in "Database Structure" section.


Command Output

The output of a command starts with its id, the next byte tells whether the command was executed successfully or not. If the command is completed successfully, the first byte after command id is '0' which is followed by the output of the command (if any), otherwise the first byte after id is '1' followed by the error message given by the files system.

So, the format for output is, [command id][0 | 1][command output]

The command output is only returned in few commands. Here is a index to output mapping of commands.

  • 0 => none
  • 1 => BSON encoded array
  • 2 => none
  • 3 => none
  • 4 => none
  • 5 => BSON encoded object
  • 6 => none
  • 7 => none

In case of index 3 (command to check whether doc/collection exists), if the collection/doc exists, the byte after id is 0, and if it doesn't, the byte after id is 1.


This is how each command is implemented in the client library,

const key = Collection.genKey(this.len); 
      
const listener = (message) => {
  if(message.startsWith(key)){
    this.ws.off("message", listener);
    
    if(message[this.len] == '0')
      resolve(undefined); // it is `BSON.deserialize(Buffer.from(message.substring(6)))` instead of `undefined` in case of command 1 (list collections/docs) and 5 (get doc content)
    else
      reject(message.substring(this.len+1));
  }
}
this.ws.on("message", l);
// key + commandIndex + path; is `key + commandIndex + BSON.serialize({ path, data, preserve })` for command 4 (set doc content) and `key + commandIndex + BSON.serialize({ path, props })` for command 5 (get doc conetnt)
this.ws.send(key+"1"+this.path);

This ends the implementation guide for the dbms. Thanks for reading the docs this far.
I am working on some really cool things for this project, I will update this guide and let you guys know when I finalize the changes.

I would love to see client libraries in different languages from from you guys.

Special thanks to @JSer for his awesome markdown guide and the person who gave me the idea to make a dbms (I forgot your name, sorry).

Thanks for reading :).

And as you would have guessed, it took me more time to write this README than to make the whole dbms.

Peace!

Commentshotnewtop
Lord_Poseidon (156)

If anyone needs help with setting up the DBMS, hmu! I've translated the DBMS so I know how stuff works.

NoelB33 (291)

Just how much time did you spend writing this post? It’s super long.

NoelB33 (291)

Woah, that must have been so exciting to finally finish [email protected]

DaLiteralPanda (1)

Good Job though I understand nothing and many ppl dont read much so if you can make this reading small then your a pro dev (:

ApoorvSingal (43)

@DaLiteralPanda This post actually has a lot (really lot) of stuff undocumented. If I had documented everything, the post would have been atleast 3 times bigger than this ¯_(ツ)_/¯.

Jakman (329)

This is good. Fine work man.

anishanne (6)

wow. What took longer? Writing the post or the project?

ApoorvSingal (43)

the post lmao. But now with the new updates(especially the unreleased ones), the code has gotten far bigger than the post. @anishanne

wulv (47)

Cool! Seems like it needed a ton of work to make

ApoorvSingal (43)

@wulv Writing this post was like 65% of the work. The implementation was easy and small. :)

ApoorvSingal (43)

@Codemonkey51 did you put the KEY in .env in both client and database repls?

ApoorvSingal (43)

Oh, seems like I messed up a bit with markdown lol.