Billing UI, Event Sourcing, and Epoch Refinements

2025-08-07
3 minute read

Today, my main focus was on the Catacloud platform, specifically developing the billing UI and implementing the billing aggregate.

Billing System Development

I've implemented the billing aggregate, which tracks operational metrics like total storage used by an organization and the amount of gigabyte-hours consumed. The system now has a saga that processes these metrics. I've also developed the UI to display daily aggregations, providing good tracking for gigabytes per hour and execution times.

Looking ahead, I plan to extend this system to support multiple services, each with specific pricing per customer, allowing for volume-based accounts and discounts. This design also accounts for future pricing changes without necessarily affecting existing customers, providing flexibility for potential migrations to new pricing models.

File Upload and Event Versioning Issues

I've started investigating why file uploads are failing, particularly larger files, when they go through the event system. It appears to be a concurrency issue. When multiple file parts are uploaded simultaneously, events are sometimes created with the same version in the database, leading to "duplicate key" errors.

The root cause seems to be the order of persistence: currently, the aggregate state is persisted before the events. This is problematic because if state persistence fails, events might already be in the database, leading to an outdated state when reconstructed. Ideally, events should be persisted first, followed by the aggregate state, preferably within a transaction. This ensures that the state is always up-to-date with the latest events.

To optimize performance, we might need to implement a snapshotting mechanism for the aggregate state, so we don't have to re-aggregate all events every time. Caching frequently retrieved states in memory could also significantly improve performance.

Epoch Improvements for Event Hydration

I've begun making changes to Epoch to improve how events are read from the event store. Instead of reading the entire event stream, I want to read only new events since the last known version. This led me down a rabbit hole of modifying Epoch's traits to support streaming events directly into the rehydration function.

Working with event streams (e.g., pulling events from Postgres as a stream) would be significantly more efficient, especially for rehydrating new projections from the beginning of time, as it avoids building the entire event store in memory. However, this introduces complexities with error propagation when reading from the stream, as the aggregate needs to handle potential errors during event retrieval.

The ultimate goal is to make all I/O asynchronous, allowing for better concurrency and multi-threading, which would be a significant improvement. The question remains whether to continue with these fundamental changes to Epoch now, given that the immediate problem is the file upload concurrency issue. I need to re-evaluate if the file upload issue is indeed a race condition that these changes would help resolve.

Atomic Operations and Transactions

The core problem boils down to ensuring atomic operations when persisting events and state. If the state persistence fails, the events should not be committed, or the system needs a robust way to recover. While a database transaction would be the ideal solution, I need to explore how to achieve this atomicity within Epoch without necessarily relying on explicit database transactions if they are not readily available or suitable. This requires careful consideration of how to handle state and event consistency.