Virtually every backend engineer I've ever met has some sort of opinion about microservices. If I had to summarize the entire discussion:
- Not having a giant monolith that requires tactics for establishing code boundaries and preventing code paths from getting all tangled up is at least of interest.
- The mere mention of microservices surfaces complex subjects about messaging between services, and data ownership and sharing.
- In my opinion, the messaging problems are largely addressed with some sort of pub/sub service, Kafka, or something that will require messages to be acknowledged and retried in the event of failure (http is usually not the best choice for this).
- The data issues are the most vexing. Namely, once you've determined where that source of truth comes from, what do you do when some piece of data needs to be shared with another service (as it often does)? Do you request it as needed via your messaging service? Create a dedicated service to serve up this data, spawning more and more services in the process? Invest in solutions that provide real-time data replication through Kafka? Rationalize sharing the same database?
What are some simple options that should not offend the purists among us?
This one is often a non-starter for many, and I'm not necessarily advocating for this approach, but I disagree with some of the reasons against this option.
In my opinion, infrastructure and application code does not have to be joined at the hip. I don't see a problem with, for example, having a single database cluster running multiple databases and having this cluster shared between services. Yes, you don't get your resource isolation so that the immense load from one service doesn't affect the other, but let's face it, databases are designed to withstand a lot of load. We don't necessarily have to coddle them. Unless your service is just getting bananas traffic and is only getting a healthy amount of traffic, sharing a cluster is not necessarily the end of the world.
If we can rationalize sharing a cluster, why not a database, especially if the same caveat of not being overly concerned with load applies? We can prevent access to tables that are not the business of the service in question with appropriate table/column grants. Yes, this requires diligently keeping up with updating and applying these grants, but this is possible, and may even be a good option if, for example:
- The service will only ever need read-only access.
- The schema of the service is fairly static and not likely to change much.
If these simple sort of conditions apply to you, maybe this is an experimental service and you don't want to invest too much into it, or the context otherwise justifies this, perhaps this better than going crazy with getting into database replication?
There isn't anything necessarily wrong with creating copies of data, so long as:
- The source of truth is never in question.
- You aren't spreading or leaking sensitive information/PII beyond where it needs to reside, from a data security perspective.
At Redactics we built what we think is a really simple approach to all of this, and it is free for developer usage (by the time you read this the relevant technologies here might be open source). I won't jump into sales talk, you can decide for yourself whether this approach is to your liking, but we clone specific tables using only SQL and provide options to redact sensitive information including PII.
The first time a table is cloned it is copied in its entirety, and the next time it is cloned only changes are copied (i.e. delta updated) using the techniques described in this blog post. The result, particularly with our support for a lot of concurrency, is updates in near real-time, but without all of the cost and overhead of technologies like Kafka/Confluent, logical replication, etc.
If your service really needs up to the second data, this approach is probably not for you, but if it can afford to be a few minutes behind, we hope this is worth your entertaining. We get around the source of truth challenges by assuring that each table that is replicated includes a column called
source_primary_key that relates to the original master. This way it is clear that this is a copy. You probably don't want to update this replicated data, but if you have to this column will help you reconcile possible differences.
Do These Options Tip the Scales?
That entirely depends on the situation and context, but certainly this helps provide the usual benefits given from having multiple services (which I don't think I need to spell out here), but making important compromises in keeping that complexity level down. If you're a startup, keeping the complexity down can be a great thing, and even if you aren't, not every situation warrants the most complex solutions, just like not every website needs to be run on Kubernetes.
Please let us know what you think! We are a new company, we really benefit from having conversations with engineers like you, no matter which way you are inclined to lean with these debates and balances.