Skip to content

Conversation

@ethanjli
Copy link
Member

@ethanjli ethanjli commented Jan 11, 2024

This PR resolves #252 by adding Prometheus + Prometheus node-exporter + Grafana to collect, save, and visualize system information (CPU, RAM, disk, etc.), and replacing the Node-RED dashboard's system monitoring implementation with an iframe around a Grafana dashboard which provides the same gauges for the same metrics.

This PR also fixes #115 by removing the previous implementation related to that issue.

This PR also detects when the Pi's system time is very different (>1 minute) from the browser's time, because that will slightly mess up the Grafana dashboard - and a difference of over 4 or 5 minutes will make the Grafana dashboard unable to display metrics. In such a situation, a message is shown above the dashboard with a button to set the Pi's system time to match the browser's time. In other words, we have a semi-automated way to fix an incorrect clock on the Pi, e.g. if it has no internet and no GPS signal and no RTC. This PR does not help #218 very much: this PR will only help to reduce the occurrence of that issue as long as the RPi is continuously plugged into power. Otherwise, it will not help. On the other hand, this PR provides a workaround for #270 in the meantime until #95 is fixed (and the workaround will continue to be useful for PlanktoScopes without a hardware RTC module even after #95 is fixed).

To test this PR, run the following command on a fresh installation of Raspberry Pi OS bullseye (replacing pscopehat with adafruithat if installing on an Adafruit HAT-based PlanktoScope):

wget -O - https://install.planktoscope.community/distro.sh | \
  sh -s -- -y -v feature/prometheus -H pscopehat

Or, for a more reproducible but less convenient command which will work the same even after this PR is merged and even if/when the install script is updated in the future, run:

wget -O - https://raw.githubusercontent.com/PlanktoScope/install.planktoscope.community/v2023.9.0/distro.sh | \
  sh -s -- -y -t hash -v 0504773 -H pscopehat

@ethanjli ethanjli marked this pull request as ready for review January 12, 2024 08:19
@ethanjli ethanjli changed the title Add Prometheus+Grafana-based monitoring of host machine Add Prometheus+Grafana-based monitoring of host machine, enable resetting system time to browser time Jan 12, 2024
@ethanjli
Copy link
Member Author

I've done testing of the logging subsystem and handling of time drift. The only unresolved issue I've found is that when the system time has been set to the future, then either Prometheus or Grafana stops collecting/showing metrics. However, this will not be a common failure mode for us except with user error, because the RPi's system time only gets stuck in the past without a hardware RTC - the system time can only go to the future if it's set to the future. The workaround which I've confirmed works is to delete the Prometheus & Grafana containers, delete their persistent volumes, and run forklift plt apply again.

Based on testing, I think this PR is ready to merge.

@ethanjli ethanjli added this pull request to the merge queue Jan 12, 2024
Merged via the queue into master with commit dc4940a Jan 12, 2024
@ethanjli ethanjli deleted the feature/prometheus branch January 12, 2024 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

2 participants