I’m interested to hear how yall would design a tiger-style time-series database.
Background
Time series databases are a type of database optimized for storing many series, where a single series can be represented like the following (in ZON):
.{
.id = "f0e4c2f76c58916ec258f246851bea091d14d4247a2fc3e18694461b1816e13b",
.data = .{
.{
.time = 12345134213143,
.value = 3
},
.{
.time = 12312513431212,
.value = 1,
},
.{
.time = 12312513431225,
.value = 5,
},
},
}
The ID is just a hash value the identifies the series. Many databases maintain indexes for looking up series with queries like:
find all series where location=headquarters, sensor_type=hvac
This query might return 32 series, each series being the temperature over time measured from the 32 HVAC units in the headquarters building. The query might be displayed as a 2D graph with 32 lines on it, the x axis being time and y axis being temperature.
Examples of time-series databases:
- InfluxDB
- TimescaleDB
- TDengine
Problems
I’ve personally experienced some issues with existing time-series databases (mainly InfluxDB)
- Out-of-memory failures under high query load
- Performance issues, query load effects ingestion performance and vice-versa
- High disk usage, unlimited disk usage, disk fills, DB just dies.
- Bad interface design leads to unoptimizable ingestion (line protocol is a horrible data ingestion format)
Bad Interface Design
Lets take a look at how InfluxDB ingests data (line protocol).
Our above dataset could be constructed by ingesting the following line-protocol:
hvac,location=headquarters compressor_temperature=3i 12345134213143
hvac,location=headquarters compressor_temperature=1i 12312513431212
hvac,location=headquarters compressor_temperature=5i 12312513431225
Glossing over things like “measurement”, “tags”, and “fields” (bad vernacular invented by InfluxDB), the first part is used to identify the series, “hvac,location=headquarters” will be sorted, hashed, the hash looks up a series ID, and data is inserted into the series on disk.
This ingestion format has the following disadvantages:
- users must repeat themselves agnosium, every line contains redundant strings, 70% of our data is just repeated “hvac,location=headquarters compressor_temperature=”. InfluxDB can ingest gzip compressed line protocol, but it still has to be expanded, sorted, and scanned before insertion.
- Its a text format, not binary, so every ingest now has to covert this text too.
- Its comma, space, and newline delimited, with annoying escaping rules.
- You get recommendations in the docs like “sort your tags before you send them to the database, according to the Go
sort()function…”
It has the following advantages:
- Its pretty easy to construct a compliant implementation, most implementations are < 1000 LOC but are typically low performance. The python implementation, for example, can be made 3x more performant with some LRU caching of string replacements, for example. This just illustrates how inefficient the format is.
The database accepts HTTP requests of gzipped line protocol. While simple in theory, wouldn’t it be cool to ingest, idk, maybe UDP datagrams, etc? Maybe even leave the whole problem of authentication, security, etc to a reverse proxy in front of the DB?
How can we provide a more reliable time-series database?
Out of memory failures
Follow a reserve-first principle. Allocate all the memory used by the database up front. Set concurrent query limits and maximum data returned per query, etc. Somehow setup an easy to define configuration for the database that enables bounded memory usage.
Performance issues
Provide a better interface for ingestion, that can be more quickly serialized to disk
Relevant tiger beetle talk:
Provide a better interface for querying (TBD).
High disk usage, unlimited disk usage
This one is harder. If we follow “reserve first”, perhaps the database could reserve its disk space at startup:
start-my-db --disk-size 100GB --time-max-years 5 --series-max-count 100k
The database could calculate the maximum possible disk usage for a 5 year ring buffer of 100k individual series and refuse to startup if the --disk-size parameter is insufficient. It could also create this fixed-size file on startup.
This is non-trivial though. Disk usage can depend on the compression strategy used (for example differential encoding), and performance can depend on how well we compress, the speed of the disk, etc.
The point is we want the user to be assured that if their disk works and the space is reserved, their database can never fall over due to disk usage.
A initial design
Given the above, I propose a design with the following characteristics:
- single node, single disk (fully in-memory “disk” easily done as well if desired)
- static configuration provided at startup
- extremely limited query api, no query engine, no crap query programming language
- maximum disk usage is known at startup, disk is reserved at startup
- maximum memory usage is known at startup, memory is reserved at startup
Which enables the following properties:
- DB is unlikely to fall over after startup.
- Monitoring the DB, setting up alerts for disk usage, etc is less necessary.
- Users are forced to set expectations up front, and are less likely to be surprised by the consequences of their database filling up.
Open Questions
- Can “reserve-first” be extended to performance? What if the database could test its own performance at startup to provide similar guarantees. Perhaps it could spin-lock reserved CPUs (that seems wasteful)? Perhaps just performance testing itself at startup and at regular intervals?
- What data ingestion format would you want?
- What query API would you want?