2 January 2024

Smash - Simple Monitoring and Alerting System for Home

Smash is a simple monitoring system for computers, devices and services, consisting of an API server with an extremely basic built-in dashboard, agent and clients. The intended use case is a hobby data centre (some call it a “homelab”).

Status and downloads

Smash is currently at a usable development version across all three components. There’s a lot more work to do but it’s on an as-needed basis.

Server 0.9.1 (source, Docker repository)
Agent 0.6.1 (source, installer download)
Clients 0.4 (source, PyPi page)

Screenshots

There is not much to see, really. But okay.

Smash dashboard showing a few nodes and theirservices

This is the application dashboard–the “Smashboard” (so clever!). This is completely non-interactive and is probably most useful after setting up the service. The extent of its sophistication at this stage is that it auto-refreshes periodically.

Smash Xbar menu 1 Smash Xbar menu 2

These are screenshots of the Mac OS X menu bar showing an overall “warning” status (a cracked screen) with nodes grouped under “server” and “other” designations, both also showing overall “warning” status for each of the groups.

The second screenshot shows the individual server “whitey” (it’s an all-white laptop, natch) selected, whose submenu shows individual status for different tests. The disk usage and Kubernetes tests each show “okay” status (intact screen) while the OS updates test shows a warning that the node needs to be updated. If the option key is held down while this menu is shown, the entries show when the status was reported, and if non-okay status, allow the user to acknowledge the status.

Component overview

Smash is a basic API server written in Go which uses Postgres for a database. I typically deploy it to Kubernetes with the Postgres server also in Kubernetes or a dedicated server.

The Agent is a collection of shell scripts (mostly) that run on each node and monitor that node’s operating environment or another node’s services. A server can run the Agent for different “nodes”, such as a node named for the server and another named “services”, or some such. The main Agent script run tests and reports the results to the server. For portability, the main script and most of the tests are written in POSIX shell or Bash, though they can be written in anything.

The Clients are a collection (so far of two) client programs written in Python. One provides a command-line interface to the API; the other is a plugin for Xbar (previously known as BitBar). The CLI client (smash) allows the user to get nodes and/or their statuses, delete nodes and/or statuses, and tag nodes with attributes (either straight labels or key-value pairs).

The Xbar client (smash-xbar) retrieves node and status information and reports it in a way that can be interpreted by Xbar, providing a handy at-a-glance status icon in the Mac menu bar which when clicked shows a menu of nodes and their statuses. Some of this is shown in the screenshots above. This is the main way I “consume” Smash’s report.

Functionality overview

Smash works pretty well but is incomplete. It works well enough for what I need and when it doesn’t I’ll improve it or expand on it. Some of the major functionalities:

Node registration: nodes register themselves with an API key
Status reporting: nodes run test scripts and report results as a state of “Okay”, “Warning”, “Error”, “Unusable”, or “Unknown”, as well as, optionally, a message
Staleness: Nodes can report results with an “expect my next” date/time value in a status report, and if this value is in the past, that result is considered stale
Acknowledgement: Using the client, a user may acknowledge a status, such that the non-okay state is somewhat quiesced. It will typically be handled or displayed differently: on the Smashboard it appears blue instead of the normal colour for the state.
Tagging: Nodes can be tagged with a label or a key-value pair.
Grouping: Nodes can be grouped by tags. In the Xbar screenshots above, nodes are grouped by “type”, where one (“whitey”) has had its attribute “type” set to “server”. The others have no such attribute set.

Missing functionality

Hoo boy.

Node and status deletion is not implemented in the client, because:
User authentication, such as an API key for the clients (as opposed to the agent) has not been implemented. This would probably be a separate class of API key or authentication.
The Smashboard is inadequate and it would be nice to be able to carry out the same operations as the clients, and to group nodes if desired, like by “stacking” them.
The whole “alerting” the “A” is for isn’t implemented.

Quickstart guide

Start a Postgres server
Deploy the server image to your Kubernetes cluster and configure it for the Postgres server
Install the agent on a node you want to monitor and register it
Set up cron to periodically run the tests you want to run and report them to the server

That’s it! Yeah I know that’s not enough. Not there yet, but reach out if you’re interested.