Smartphones use SQLite (a file-based DB) heavily. Why store data in SQLite instead of some other format, say JSON?
Because you risk data loss if it crashes during an update. The file can end up half-written, truncated, or even missing.
There are techniques to fix this, and they lead to databases.
Durability and atomicity with fsync
Atomicity means that data is either updated or not, not in between.
Durability means that data is guaranteed to exist after a certain point.
They are not separate concerns, because we must achieved both.
The first thing to learn is the fsync syscall.
A file write doesn’t reach disk synchronously, there are multiple levels of buffering (OS page cache and on-device RAM).
fsync flushes pending data and waits until it’s done.
This makes writes durable, but what about atomicity?
Control latency and cost with indexes
Query result is not the only concern, latency and cost (memory, IO, computation) are also relevant, hence the distinction between analytical (OLAP) and transactional (OLTP).
OLAP can involve large amounts of data, with aggregation or join operations. Indexing can be limited or non-existent.
OLTP touches small amounts of data using indexes. Low latency & cost.
The word “transactional” is not about DB transactions, it’s just a funny jargon.
In-memory vs. on-disk data structures
extra challenges when putting an indexing data structure on disk
updating disk data in-place, because you have to deal with corrupted states after a crash. Disks are not just slower RAM.
for disk-based data because random access is much slower than sequential access
So data structures like binary trees are not viable while B-trees and LSM-trees are OK. see: [trees]
Relational DB on KV
SQL is just a user interface, it’s not fundamental to a DB. What’s important is the functionalities underneath.
KV is simpler than SQL because it’s one layer lower.
Relational DBs are built on top of KV-like interfaces called storage engines.