
11 Monitoring ScyllaDB
This chapter covers
- Configuring the Scylla monitoring stack against your cluster
- Using Prometheus to collect metrics
- Viewing dashboards and visualizations of your cluster’s performance using Grafana
- Load-testing via
cassandra-stress
- Diagnosing and remediating common incidents
To run a database in production, you need to know if it’s actually running. The rest of the book is about using Scylla in a way that minimizes the chances of an alert happening, but this chapter is about monitoring your cluster. Not monitoring a database is a great way to never get paged in the middle of the night, but it’s also highly frowned on by users, managers, and about every best practice out there. Here, you’ll learn how to monitor Scylla, observe its performance and generate alerts to clue you in on problems in your cluster.
Ideally, your cluster never has a problem, and you’re never paged. You’ll learn how to load-test Scylla to help determine how much traffic your database can handle, compare it against your expected traffic volume, and size the cluster appropriately. Additionally, a load test is a great way to see the monitoring tools in action; they generate load on the cluster that you can see in dashboards.