Microservices Without the Complexity or Architectural Concessions

PublishedNovember 15, 2022

On multiple occasions in my career I've been tasked with creating production data samples and setting up environments to use this data. Seeing a pattern here, and having spent a lot of time working on perfecting this, I built a business out of this (https://www.redactics.com) and later found my co-founders to take the original concept, generalize it, and extend the original concept into full-fledged data privacy product with its initial focus around bringing safe production data into test, development (local and cloud-based), and demo environments.

Virtually every backend engineer I've ever met has some sort of opinion about microservices. If I had to summarize the entire discussion:

Not having a giant monolith that requires tactics for establishing code boundaries and preventing code paths from getting all tangled up is at least of interest.
The mere mention of microservices surfaces complex subjects about messaging between services, and data ownership and sharing.
In my opinion, the messaging problems are largely addressed with some sort of pub/sub service, Kafka, or something that will require messages to be acknowledged and retried in the event of failure (http is usually not the best choice for this).
The data issues are the most vexing. Namely, once you've determined where that source of truth comes from, what do you do when some piece of data needs to be shared with another service (as it often does)? Do you request it as needed via your messaging service? Create a dedicated service to serve up this data, spawning more and more services in the process? Invest in solutions that provide real-time data replication through Kafka? Rationalize sharing the same database?

What are some simple options that should not offend the purists among us?

This one is often a non-starter for many, and I'm not necessarily advocating for this approach, but I disagree with some of the reasons against this option.

In my opinion, infrastructure and application code does not have to be joined at the hip. I don't see a problem with, for example, having a single database cluster running multiple databases and having this cluster shared between services. Yes, you don't get your resource isolation so that the immense load from one service doesn't affect the other, but let's face it, databases are designed to withstand a lot of load. We don't necessarily have to coddle them. Unless your service is just getting bananas traffic and is only getting a healthy amount of traffic, sharing a cluster is not necessarily the end of the world.

If we can rationalize sharing a cluster, why not a database, especially if the same caveat of not being overly concerned with load applies? We can prevent access to tables that are not the business of the service in question with appropriate table/column grants. Yes, this requires diligently keeping up with updating and applying these grants, but this is possible, and may even be a good option if, for example:

The service will only ever need read-only access.
The schema of the service is fairly static and not likely to change much.

If these simple sort of conditions apply to you, maybe this is an experimental service and you don't want to invest too much into it, or the context otherwise justifies this, perhaps this better than going crazy with getting into database replication?

There isn't anything necessarily wrong with creating copies of data, so long as:

The source of truth is never in question.
You aren't spreading or leaking sensitive information/PII beyond where it needs to reside, from a data security perspective.

At Redactics we built what we think is a really simple approach to all of this, and it is free for developer usage (by the time you read this the relevant technologies here might be open source). I won't jump into sales talk, you can decide for yourself whether this approach is to your liking, but we clone specific tables using only SQL and provide options to redact sensitive information including PII.

The first time a table is cloned it is copied in its entirety, and the next time it is cloned only changes are copied (i.e. delta updated) using the techniques described in this blog post. The result, particularly with our support for a lot of concurrency, is updates in near real-time, but without all of the cost and overhead of technologies like Kafka/Confluent, logical replication, etc.

If your service really needs up to the second data, this approach is probably not for you, but if it can afford to be a few minutes behind, we hope this is worth your entertaining. We get around the source of truth challenges by assuring that each table that is replicated includes a column called source_primary_key that relates to the original master. This way it is clear that this is a copy. You probably don't want to update this replicated data, but if you have to this column will help you reconcile possible differences.

Do These Options Tip the Scales?

That entirely depends on the situation and context, but certainly this helps provide the usual benefits given from having multiple services (which I don't think I need to spell out here), but making important compromises in keeping that complexity level down. If you're a startup, keeping the complexity down can be a great thing, and even if you aren't, not every situation warrants the most complex solutions, just like not every website needs to be run on Kubernetes.

Please let us know what you think! We are a new company, we really benefit from having conversations with engineers like you, no matter which way you are inclined to lean with these debates and balances.

#microservices #databases #architecture #replication

Comments (1)

Join the discussion

José Pablo Ramírez Vargas3y ago

Hello. Having had a couple of microservice project under the belt now, I would say that yes, it is not the end of the world to be in one database. However, it is my strong belief this should only be temporary and not the norm. It is a valid migration path when transforming a monolithic to microservices for sure. If one is using SQL Server where schemas exist inside a single database, one can separate the microservices' data in schemas.

But as one can learn by reading about the topic, microservices need to be independent. Sharing a database goes against this principle, even more if you make the source of its data dependant on the database server technology. This kills the microservice's buzz by constraining it to one type of database or one subset of all possibilities. If the microservice wants or can benefit from another DB engine, now you are out isolated, with no means to obtain data because only the "buddies" that use the same DB can get into the data flow.

So if your change is required to achieve some important milestone, like performance, or saving data in a way the current DB cannot, will force you into messaging for sharing the source of truth's data. Now you are back to square 1. If needing messaging was then always on the table, it is just best to plan for it from day one.

So in retrospective you can conclude it was best to adhere to the microservice specification from the start. All the rationale applied to bypass it was a mistake, etc.

If someone comes asking me if they should replicate data under the hood like this, I'd definitely tell them to be a mistake.

Joe Auty3y ago

Thanks for your reply!

I'd like to learn from this by clarifying a few points you've made, if you don't mind?

I definitely agree that sharing a database is not an ideal end result in many/most circumstances, and your point about being tethered to the same DB engine is a good one.

Would you say that there are circumstances where the complexity of messaging to share the source of truth's data overshadows the cons of sharing the database? In a purist's world, a microservice would never need data from another service (that cannot be passed down in the initial payload), but it seems like it is pretty common that tech debt is accrued when this no longer becomes the case? This seems to be one of the most common pitfalls of the architecture, would you agree?

I'm curious what you feel about the second set of options for sharing copies of data as an alternative to sharing the entire database, or as an alternative to spinning up messaging to get at data that belongs to another service?

Thanks again for weighing in!

José Pablo Ramírez Vargas3y ago

Joe Auty In my short experience in microservices, which is so far 2 projects, I would say I haven't encountered a scenario that is so complex that cannot be transmitted in some form. I am therefore inclined to respond that No, I would not support using the same database or a same-database mechanism to share data between microservices behind the scenes.

For example, what would be considered "complex"? In my mind my limited imagination can get to say, a scenario where some Record-Of-Origin data is used to perform calculations in another microservice, and said calculations are then required by yet another microservice. This is as complex as I can imagine right now. If you have a better more complex scenario, by all means, provide it.

In this most complex scenario, I'd say it is still pretty simple to chain the data in a near-realtime ecosystem powered by message queueing. By simply tagging the messages appropriately, the farthest microservice in the line can simply wait its turn (the appropriately tagged message).

So I guess that, in my short-lived microservices career, I'd be against data sharing behind the scenes at the database layer. Maybe time will prove me wrong, I don't know. This is what I know now. Time will tell.

More from this blog

The Easiest Way to Clone a PostgreSQL Database

If you use a managed database service like AWS RDS or Google Cloud SQL, it is super easy to create a snapshot and create a new instance based on this snapshot, but there is an even easier way to clone a database. At Redactics our use case was buildin...

Dec 6, 20221 min read

How To Get the Most Out of Airflow's Dynamic Task Mapping

Dynamic task mapping (DTM) is a major feature that adds a lot of flexibility to how you build your DAGs. However, it does not provide infinite flexibility and break you free of being beholden to Airflow's patterns. For example, your task mappings are...

Dec 5, 20223 min read

How To Automate Database Migration Testing/Dry-runs in Your CI/CD Pipelines

The most logical starting place is to stand up a new database cluster based on a snapshot, especially if you are using a managed database service like AWS RDS where doing so is easy. However, this still requires some manual steps or some sort of scri...

Nov 28, 20227 min read

How To Poll an Airflow Job (i.e. DAG Run)

Ever wanted to actually know when an Airflow DAG Run has completed? Perhaps your use case involves this completed work being some sort of workflow dependency, or perhaps it is used in a CI/CD pipeline. I'm sure there are a myriad of possible scenario...

Nov 22, 20221 min read

What Every Developer Should Know About Data Security/Privacy

There are studies, including this one, that show that the vast majority of data breaches are due to human error. Some of this human error is the result of being vulnerable to manipulation, social engineering, etc. Other forms are due to bad practices...

Nov 18, 20227 min read

Redactics: Supercharge your Data Architecture

13 posts

Command Palette

Database Sharing

Data Sharing

Do These Options Tip the Scales?

Comments (1)

More from this blog