Hands-on Nomad

Table of Contents

What is Nomad
#

A simple and flexible scheduler and orchestrator to deploy and manage containers and non-containerized applications across on-prem and clouds at scale.

Especially the “non-containerized” applications are an interesting feature. You can run your JVM based app without an additional layer of Docker, for instance.

Nomad runs as a single binary with a small resource footprint …

I really like the ease of deploying applications that run only from a single binary!

Goals
#

To get a feel for Nomad, I asked myself the following questions:

How do I create a simple Nomad cluster to get a feel for the technology?
How do I set up an authentication mechanism?
How do I deploy different kinds of workloads?
How do I route traffic inside the cluster?
How do I isolate developers from certain actions or resources?
Can I do this without Consul and Vault, both by HashiCorp ¹?

Not in scope yet are the following topics:

Nomad’s enterprise features.
Running Nomad on cloud infrastructure.
Auto-scaling of nodes.
Multi-region federation.
Nomad and Vault integration.

During this blog post, I will deal with all of those questions and hopefully provide some answers. Let’s get started! 🚀

You can find the source code of everything from this blog post in my repo hands-on-nomad on GitHub.

Create a Nomad cluster
#

Remember, my goal is not to create a highly available, multi-regional, failure-redundant cluster but to create the most simple cluster to get a feel for Nomad and maybe use it for non-critical (personal) workloads. There are also plenty of repos that deal with the creation of a Nomad cluster by providing an Ansible playbook or Terraform files for a cloud provider. Without judging those repos, I decided to keep it basic (again) and just write a not sophisticated bash script.

By executing each command step by step and “documenting” them in a bash script, I will, at least in my opinion, learn Nomad more effectively. I believe that, before you can automate stuff or use some sort of “scripted” solution, you have to get your hands dirty and go the manual way.

As mentioned earlier, Nomad is just a single binary ², so the installation is as straight-forward as just downloading the binary. However, I have chosen to install Nomad with the package manager of my Debian system.

I will also install Docker and OpenJDK so that I can run workloads of both types.

Refer to the documentation for all available drivers.

The configuration file for the cluster (generated by the bootstrap script) is as follows:

log_level = "DEBUG"
acl {
  enabled = true
}
client {
  enabled = true
}
server {
  enabled = true
  bootstrap_expect = 1
}
datacenter = "dc1"
data_dir = "/opt/nomad"
name =  "example.com"

This enables the “server” and “client” roles on a single node cluster. Again, this is not a configuration you want in a production setup! Furthermore, I want to enable ACL.

In addition, my little bootstrap script creates a systemd unit to conveniently start/stop/restart Nomad as a service.

The last step the script does is bootstrap the ACL capabilities of the cluster. This command saves the first so-called “management token” (aka admin permissions in Nomad) in a file named bootstrap.token. You can copy this token to your local machine and use it with the provided Makefile.

You should now be able to reach Nomad via http://<YOUR_IP>:4646/.

By default, Nomad has a “deny all philosophy”. By enabling ACL, I locked myself out of the cluster. In the next section, I will take a look at Nomad’s ACL system.

Authentication and ACL
#

Nomad’s documentation on ACL is pretty good, so I won’t repeat it here. To proceed, I decided to use the generated management token from the ACL bootstrap as “my token” and don’t create a new one.

In case I did not mention it, the single Nomad binary does not only act as a server or client, but also as a (remote) cluster CLI. That’s comprehensive!

The management token also works as a login credential for the UI.

In case you want a little bit more convenience, you can find a policy that grants full access to anonymous users (make anonymous). Use with care!

We will take a look at this topic in another section, when I look into isolating developers and teams.

Deploying workloads
#

Again, Nomad’s documentation on jobs is rather good, and I won’t repeat the documentation. In the repo, you can find three simple workloads. Two of them are based on Docker, and one is a Java Spring Boot app.

hello.hcl is a simple containerized (Docker) Web server that returns its IP and port (make hello).
blueprint.hcl is a Java reference architecture running via the host’s JVM rather than a Docker container (make blueprint). See Limit developer access for more details.
traefik.hcl is an application proxy that will be covered in the next section (make traefik).

Please note the provider = "nomad" line in each service’s definition. This tells Nomad to use the built-in service discovery introduced in 1.3. The default method would be consul.

Routing and load balancing
#

After deploying the hello.hcl job, there are three instances of the server running, each of them with a random port. To access those instances and also distribute the load on all of them, we need some kind of proxy.

There are some choices available, and I recommend (once again) the documentation. I have chosen Traefik for my hands-on.

Basically, you just need to apply traefik.hcl to the cluster, and you are good to go with one exception. Traefik needs to access Nomad resources to do its work and generate routes to (dynamic) workloads. Remember: in Nomad, everything is denied if not explicitly permitted. One solution is to add the –providers.nomad.endpoint.token parameter to the job definition. But there is, in my opinion, a way more elegant solution: workload identity.

By providing a minimal policy for ACL (traefik.policy.hcl) we can “attach” the policy to the workloads, and they are now granted the appropriate permissions.

You can access Traefik’s dashboard via http://<YOUR_IP/dashboard/ (the trailing slash is mandatory!) and the user/password admin:admin. The credentials are created with htpasswd -c auth admin, which generates a basic auth string for the user “admin” that is saved to a file called auth. This string is stored in a Nomad variable, a very basic Vault alternative, and referenced by the deployment:

{{- with nomadVar "nomad/jobs/traefik/traefik/server" }}
"{{ .BASIC_AUTH }}",
{{- end }}

If you have deployed the hello and/or blueprint job, you should see entries for both in Traefik’s dashboard. Test the load balancing via http://<YOUR_IP/hello/. You should see a different port number each time you request the web page.

Limit developer access
#

When working with different teams, it is often sensible to restrict each team or developer to certain resources only. In Nomad, you can use ACLs to accomplish this task.

The following policy provides a starting point to limit access to a namespace and some actions:

namespace "dev" {
  policy       = "read"
  capabilities = ["submit-job","dispatch-job","read-logs"]
}

With this policy in place, we can create a token for a developer via nomad acl token create -name="Max Mustermann" -policy="dev"

Token management like rotation, distribution, etc. is a topic not covered right now

A developer can use this token by exporting environment variables (like in the Makefile).

Can I do this without Consul and Vault
#

Until this point, everything was set up without Consul and Vault. Of course, this is not a production-ready setup, but I think you should start with a basic approach, understand the fundamental concepts, and then grow according to your needs.

Thoughts
#

This section is a dump of my thoughts on playing around with Nomad. On purpose, I don’t want to distinguish between “the good” and “the bad” as I am missing real-world experience with Nomad. Therefore, the following points may be rather subjective or since I am not as experienced with Nomad as I am with Kubernetes.

In Kubernetes, there is this concept of internal (Service resource) and external (Ingress resource) traffic, which I miss in Nomad and looks like it is not so easy to implement³. In this hands-on, the blueprint and the hello deployments can be accessed via Traefik (path) and, if known, the port directly.
Writing jobs via HCL is not too different from Kubernetes YAML, and the concepts are quite familiar. However, I guess in the long term, a DSL is more robust.
Running workloads without Docker is amazing and feels so right (in some cases). You can save a lot of complexity and maintenance work by omitting the Docker layer.
Deploying and running Nomad feels much easier than Kubernetes, but of course, I have not covered a production-ready HA, multi-region, scalable setup.
It feels like Kubernetes has more resources on the web and a bigger community. Googling for Nomad topics felt more cumbersome to me.
I didn’t spend much time researching how a declarative GitOps approach, like ArgoCD or Flux for Kubernetes accomplish, could be implemented with Nomad⁴
There is no “managed Nomad” like EKS, AKS, or GKE available (August 2023), and you have to run your cluster yourself. Luckily, it should not be that hard in comparison to Kubernetes.

Don’t get me wrong, these are excellent tools, but I want to check how far I can get with just Nomad for a home lab environment or for non-critical deployments keeping the complexity at an absolute minimum. ↩︎
For some networking features, additional binaries (CNI plugins) are required. ↩︎
Asked on Reddit and HashiCorp Discuss. ↩︎
Discussion regarding this topic. ↩︎

What is Nomad#

Goals#

Create a Nomad cluster#

Authentication and ACL#

Deploying workloads#

Routing and load balancing#

Limit developer access#

Can I do this without Consul and Vault#

Thoughts#