Open-source Media Streaming Service for Kubernetes

Joe Auty's photo
Joe Auty
·Oct 10, 2022·

4 min read

We just built a media streaming service for Kubernetes. I know this sounds kind of weird, and you're probably wondering how Kubernetes is relevant, so let's delve into the details...

At Redactics we needed a way to provide writeable persistent storage to multiple Kubernetes pods. Cost effective ReadWriteMany storage options are generally somewhat limited, in our experience. Using Amazon S3 or the like was also not a great option for us, because the Redactics SMART Agent uses Apache Airflow and the KubernetesPodOperator for a number of its workflow steps - many of which run in parallel. This means a lot of short-lived pods are created, and having to upload and download the same files needed by the pods multiple times (sometimes concurrently) is not only a bandwidth killer, but it breaks our security model of all work running locally without data being sent to the cloud (and this includes mounting the S3 bucket via FUSE-based file systems). We also looked at NFS-based solutions, including NFS Kubernetes storage class solutions and Google Cloud Filestore. We had some luck with NFS until we had to figure out how to enlarge our NFS volumes, and we also didn't like the lack of traceability of file transfers. The minimal GCP Filestore volume size is 1TB which was cost prohibitive given our files are much smaller. From a business perspective, we promote Redactics as an appliance you run in your own infrastructure, so figuring out how to somehow cost share with S3 buckets in our account or get our customers to set up their own buckets was awkward at best.

Additionally, we were looking for the fastest possible performance with the minimal possible memory overhead, which means streaming files as much as possible rather than reading complete files into memory. Tools such as cURL, pgdump/pgrestore, etc. can handle the streaming coupled with Linux pipes (more on this later), and we wanted the option to run various workflow steps at the operating system level, so this was ideal for us.

So, we decided to build our own thing, and were satisfied enough with the results that we felt this would be useful to open-source and share with the world! We called the repo http-nas, the NAS part being short for network-attached storage since this is essentially an http based NAS. We did built it with Kubernetes in mind, but if you can come up with another use-case for this technology it can certainly be used outside of Kubernetes as well.

Http-nas is a Node.js based API that simply streams files to and from a directory. We chose Node because it supports file streams and is generally very lightweight (Golang would have likely been another good choice). On Kubernetes you can attach a persistent volume claim to this service, enlarge it as needed, and as a Kubernetes service it can be accessed via http and the auto-created DNS entries derived from service names. Full concurrency is supported, at least as far as we decided to test this. We were interested in http as an interface since we wanted to access this API via Linux using cURL and cURL supports sending and receiving chunked file streams. This service is intended to be run on the same LAN as where the files originate, so we felt that we didn't need authentication in our first iteration. In the future we might embrace the concepts of zero-trust networking by adding some sort of authentication/validation layer (a simple option is simply http basic authentication).

Full documentation can be found at the repo, but to save you a trip here is an example of two basic file operations using this API:

  • Stream (via http/REST post URL) file to service: cat /path/to/cat.jpg | curl -X POST -H "Transfer-Encoding: chunked" -s -f -T - http://http-nas:3000/file/cat.jpg
  • Append (via http/REST put URL) to file: printf "test append" | curl -X PUT -H "Transfer-Encoding: chunked" -s -f -T - http://http-nas:3000/file/mydata.csv

Please let us know what you think at our website, or write to me personally - whichever is easiest, and please make use of this project and contribute to it if you feel inspired to do so. Thank you!

Share this