Epoch Challenges

2025-07-31
5 minute read

Today, I focused significantly on Epoch, but also dedicated time to the GFF3 implementation. I updated the documentation for the GFF3 validation rules, incorporating new code examples and a diagram, and removed outdated dependencies. I believe the documentation is now in a very good state.

I also identified and resolved a small bug in the rule state manager and we closed the Pull Request that added validation for duplicated sequences. I need to spend more time considering the validation engine, and the new encoding and decoding processes for sequences.

I received confirmation that there's no masking on the ENA sequences. This is excellent news because it means I should be able to utilize three bits per base for encoding, which, based on some tests I've done, will lead to a substantial reduction in storage needs, this could result in approximately a 75% reduction in size after compression. This is quite exciting, and I'm surprised no one has implemented this before, though perhaps I'm overlooking something.

Aggregate Implementation Bug

I encountered multiple limitations in Epoch today. The first issue was that my aggregate implementation was saving events before they were persisted in the aggregate store. When an event is saved, it propagates, meaning projections were being called before the aggregate's state was persisted. This led to situations where a projection might require the aggregate's state, but it didn't yet exist, resulting in errors (not null pointers, as I'm using options, but similar state-not-found errors).

This is clearly not ideal. It feels as though events should be emitted after both the state and the events are persisted. Currently, it's all tangled together in the event store, which is problematic. As a quick fix, I'm now saving events after the state. However, this isn't a long-term solution. I don't even think transactions would fully resolve it, as the same issue could arise if projections are not within the same system, leading to latency-dependent bugs – the worst kind to find.

I have an intuition for a proper resolution: separating emission from persistence. The persistence would be done in a transaction, and the emission would happen afterwards. This seems to be the best way forward. The state for both events and aggregate state would be persisted in an atomic operation. We could then have an index of emitted events, allowing for re-emission later if, for example, the event bus service is down. From an eventual consistency perspective, this would be acceptable because the source of truth (events and aggregate state) would be persisted.

Time Series Creation Limitation

I found another limitation in Epoch while trying to create a time series from a couple of events using a projection. Specifically, I needed to create a time series from file part uploaded and file deleted events to track user storage usage over time. This required knowing the last event in the time series to calculate the new event based on it.

My initial approach was constrained by Epoch's current implementation. To resolve this, I decided to add a new event called UserFileStorageChanged. This event will be emitted in both file part uploaded and file deleted scenarios, and will include the total storage used plus the delta between the old state and the new state.

I also introduced two new projections:

  1. One to store this UserFileStorageChanged event directly into a time series.
  2. Another, UserMetrics, to calculate the current user storage usage. I can then query this projection from the aggregate for files and use it to calculate the total storage used for the new event.

This new approach allows me to calculate both the current user storage usage and show historical changes, which is quite useful. Interestingly, this limitation actually led me to a better solution than my initial intent. However, I still believe it highlights a limitation in Epoch that I would like to address in the future.

I'm starting to believe that the create, update, and delete constraints I have are too restrictive. The enum subset logic I have implemented to filter events is good, but limiting it to just these crud event categories is short-sighted. There's much more to it. Also, on the store side, when retrieving state, it should be possible to get events and then decide how to retrieve the state based on those events. This would allow for potential joins or picking state from multiple tables, as currently it's limited to a single table, which is also not ideal. The persistence side is similar. I need to generalize it more to remove many of these limitations, but for now, I'll continue with the current approach.

Metrics Projection Creation Limitation

Adding to the complexities, the new UserMetrics projection currently has no creation event. Its creation would logically align with user creation, but I don't currently have an aggregate for the user. This means that for metrics, there are only update events, and no creation event.

Epoch currently limits updates to always have an initialized state; if there's no state in the database, it will fail. This is a significant shortcoming. I'm unsure how to work around this. My options are to either add a projection for the user (which I'm not keen on) or modify Epoch. I don't see another immediate solution right now. It's late, so I'll drop this for today. It's been a very long day.