Databases

Databases & Data Architecture

Databases are the foundation of any modern platform. The way data storage and processing are architected determines the project’s speed, scalability, and reliability. At Webdelo, we treat the design and selection of databases as a key architectural decision — not just a technical one. For us, it’s a strategic tool that directly impacts the client’s business goals.

resized_400x400 resized_400x400
General approach to working with databases

General approach to working with databases

Choosing a database is an architectural decision that directly impacts system efficiency. Different scenarios require different approaches: transactions, analytics, caching, document storage, telemetry. We build a multi-layered ecosystem where each database serves a dedicated role, connected by clear, transparent contracts. At the design stage, we define non-functional requirements — response time, acceptable latency, data volume, and availability SLA. We model load, forecast growth, and select a combination of databases for each scenario. For caching, we use dedicated layers to separate temporary data from the primary source of truth. This approach allows us to combine SQL and NoSQL while planning for scalability from the start.

Key data storage roles in a typical system:
  • Transactions: source of truth (PostgreSQL, MySQL/MariaDB).
  • Analytics: fast aggregation and querying (ClickHouse).
  • Documents and dynamic structures: flexible schema (MongoDB).
  • Cache/Sessions: high-speed access with limited lifespan (in-memory solutions).
  • Telemetry/Events: streaming bus and long-term storage (Kafka + analytical database).
PostgreSQL: The Reliability Standard

PostgreSQL: The Reliability Standard

We use PostgreSQL where transaction support (operation reliability) and strict typing are critical. Our data models are designed with integrity constraints and record lifecycle in mind. Indexes are selected consciously: B-Tree for exact queries, GIN for full-text search, and BRIN for long time-based tables. To put it simply — these are different ways to accelerate queries and optimize performance on large datasets. Partitioning helps speed up operations by splitting large tables into smaller segments. Materialized views store the results of complex queries for reuse. Extensions like PostGIS (for geospatial data) and TimescaleDB (for time-series data) further expand PostgreSQL’s capabilities.

To make the system reliable, transaction management is essential. Isolation levels control parallel access: Read Committed suits most scenarios, Repeatable Read eliminates dirty reads and phantom rows, while Serializable enforces strict operation order. To prevent deadlocks (when transactions block each other and stall progress), we define operation order, limit transaction duration, and apply retry logic. In high-concurrency systems, we use optimistic locking through row versioning to avoid unnecessary delays.

MySQL and MariaDB: Flexibility and Speed

MySQL and MariaDB: Flexibility and Speed

We use MySQL and MariaDB when high read speed, a mature tool ecosystem, and predictable performance are priorities. These databases are particularly effective for web platforms, CMS, and CRM systems, where fast data access and scalability are essential. The master-replica model involves a single primary server for writes and multiple replicas for reads, enabling load distribution and fault tolerance. As data volume grows, we apply sharding — splitting data by business keys (for example, by users or regions). To balance traffic, we implement a proxy layer that manages query routing and relieves the primary server. Schemas are modeled for OLTP scenarios — in simple terms, systems that process large numbers of short transactions, such as e-commerce or CRM platforms. To maintain performance, we avoid heavy joins on hot paths and optimize indexes and queries in advance.

MongoDB: Document-Oriented Storage

MongoDB: Document-Oriented Storage

We choose MongoDB for dynamic data structures and nested documents. It’s well-suited for IoT, logging, and signal storage, where event formats can change over time. We define distribution keys in advance to prevent overload and improve query performance. The Aggregation Pipeline allows analytics to run directly within the database — for example, generating real-time event reports. Write concern is configured according to system requirements — from fast writes without strict guarantees to reliable acknowledgment across the majority of servers. Replication is designed with CAP theorem in mind, balancing speed, consistency, and fault tolerance.

Practical considerations:
  • TTL indexes automatically remove outdated records.
  • Archived collections are stored separately from hot data.
  • Drivers and protocol versions are kept up to date to ensure updates don’t break production systems.
ClickHouse: Analytics and Real-Time Processing

ClickHouse: Analytics and Real-Time Processing

We use ClickHouse in projects with massive data volumes — trades, exchange logs, telemetry. Its columnar structure provides fast queries and cost-efficient storage. We design the schema in advance, defining how data is distributed and what sorting keys are applied. Sorting keys define the order in which data is organized to accelerate queries — helping prevent overloads and ensuring high-speed performance. In combination with Kafka, ClickHouse ingests event streams directly: data is written into intermediate tables and then distributed across partitions. This approach enables real-time analytics.

Use cases:
  • Trades: partitioned by day and instrument, providing fast queries for risk analysis.
  • Exchange logs: building leaderboards, calculating latencies, and percentiles by user groups.
  • Telemetry: storing both raw data and aggregates for dashboards.
How Cloudflare Moved from Selective Log Storage to Full Retention and Fast Querying with ClickHouse
cloud_Click_w cloud-click-b
Previously, Cloudflare’s logging system used sampling to avoid overloading the infrastructure — but that limited debugging and incident analysis. After a proof of concept, the company migrated to ClickHouse, enabling full log retention for the entire storage period and fast aggregated querying. This transition provided more accurate analytics and allowed engineers to respond to issues much faster.
Cloudflare Blog, «Log analytics using ClickHouse»
Data Storage Architecture Design

Data Storage Architecture Design

We separate hot and cold data layers to maintain both speed and long-term storage efficiency. The hot layer is optimized for writes and fast queries — it stores transactions, recent updates, and frequently accessed information. The cold layer is designed for low-cost, long-term retention, containing archives, historical records, and analytical datasets. In the transactional layer, we use PostgreSQL as the source of truth, MongoDB for flexible document handling, and ClickHouse for aggregations and complex reporting. This three-tier approach delivers both reliability and flexibility.

The Write Fast — Read Later pattern helps the system stay responsive during writes: events are quickly received and recorded, while detailed processing and analytics happen later in specialized storages. CQRS (Command Query Responsibility Segregation) enables fast, lightweight data queries without impacting the transactional layer. Event Sourcing (storing state changes as events) makes it possible not only to restore the current state but also to trace the entire history of changes.

Together, these practices make the architecture transparent: the business gets instant operational response while retaining the ability for deep analysis based on historical data.

cropped-image (1) cropped-image
«When designing data storage architecture, we always start from the business goals. In some cases, transactional integrity is critical; in others, data structure flexibility or aggregation speed matters most. We don’t rely on a single technology — instead, we combine SQL, NoSQL, and streaming solutions to deliver stable and scalable systems for our clients.»

Dmitri Cercel

Tech Lead, Software Architect

Consistency and Locking

Consistency and Locking

Data consistency ensures integrity: information in the database must remain correct and synchronized, even when multiple users or services modify it simultaneously. In simple terms, if you place an order, it must appear the same across all parts of the system. Isolation levels control how parallel transactions see changes. Row-level locking (locking individual rows during concurrent updates) and optimistic locking manage concurrent modifications. Deadlocks (when transactions block each other and the process stalls) are prevented through operation ordering and timeouts.

To put it simply, eventual consistency means that data synchronizes not immediately but through event propagation. For example, an order in an online store may appear in the delivery service a few seconds after being placed. The final state will match across all services, just not instantly. This approach uses retries and idempotent operations — safe to execute multiple times — and is often applied in distributed and microservice systems where speed and scalability matter more than immediate synchronization.

Scalability and Fault Tolerance

Scalability and Fault Tolerance

We design scaling strategies in advance so the system can grow with the business without hitting bottlenecks. Replication reduces read load and increases availability: primary data is written to one server, while replicas handle read queries. This lowers the load on the main node and provides fault tolerance. Sharding allows data to be split into segments and distributed across different servers — for example, some servers handle customers in Europe, others in Asia. This approach ensures linear performance growth as the user base expands.

In master-slave topologies, one server handles writes while replicas serve reads, which works well for read-intensive systems. In master-master configurations, multiple servers can read and write simultaneously. This improves fault tolerance and minimizes downtime risk but requires synchronization mechanisms to prevent write conflicts.

Backups are built with point-in-time recovery (PITR) support, meaning the system can be restored to any state prior to a failure. We regularly test backups and simulate failure scenarios to ensure recovery takes minimal time in real conditions. Monitoring covers queries, replication, latency, and load metrics. We identify and resolve bottlenecks before they become problems. This comprehensive approach ensures the system can withstand both traffic spikes and unexpected incidents.

PostgreSQL sharding enabled Instagram to handle millions of queries per second.
pstgr_inst_blck Frame 1
In its early stages, Instagram faced a challenge — a single PostgreSQL server could no longer handle the growing number of users and operations. The solution was horizontal sharding, where data was distributed across multiple nodes. This allowed the system to scale with its audience and handle peak loads smoothly. The approach became a cornerstone of Instagram’s fault tolerance and high-performance architecture.
Instagram Engineering Blog

Complete Structure Overview

To clearly illustrate the role of each technology, we’ve compiled the key findings into a single map. It helps visualize which problems each database and architectural approach solves:

  • PostgreSQL / MySQL. These relational databases form the core of transactional systems. They ensure strict data integrity, handle thousands of operations per second, support indexing, and control locking. PostgreSQL works best for complex relationships and SQL-level analytics, while MySQL and MariaDB are ideal for systems where read speed and scalability simplicity are critical. Their main value lies in the reliable processing of business operations such as orders, payments, and accounting.
  • MongoDB. Suitable for environments where data structures change frequently — user profiles, IoT events, or logs. It stores nested documents without a rigid schema and scales horizontally. MongoDB provides flexibility for business: new attributes can be added without complex migrations, accelerating feature delivery and product adaptation.
  • ClickHouse. Optimized for analytics and large-scale datasets. Its column-based structure enables fast aggregation of millions of rows and near real-time reporting. Used for trading logs, telemetry, and user behavior analytics. The key advantage of ClickHouse is the ability to process massive data volumes without compromising speed or storage efficiency.
  • Data storage architecture. Proper separation of hot and cold data, along with patterns like CQRS (separating write and read logic) and Event Sourcing (storing change history as events), makes the system predictable and maintainable. Hot data remains readily available for fast transactions, while cold data supports analytics and long-term retention. This balance ensures both high performance and analytical depth.
  • Scalability and fault tolerance. Replication, sharding, backups, and monitoring create a resilient infrastructure foundation. They ensure that the system continues operating even during failures or traffic spikes. For the business, this means predictability — the service remains stable under load and recovers quickly in emergencies.

This map demonstrates that database architecture is not a collection of isolated solutions, but a well-structured system where each component fulfills its purpose and reinforces the others.

Conclusion

A well-designed database architecture ensures scalability and fault tolerance. We build systems where data becomes a driver of business growth. The combination of SQL and NoSQL technologies operates as a unified, efficient mechanism.

Do you have a good project?

We will be happy to discuss it with you!

Start a project