| [19:31:25] | * anarcat has quit (Read error: Connection reset by peer) |
| [19:31:42] | * anarcat has joined #aegir |
| [03:34:16] | <bgm[m]> | Semi-offtopic: any icinga2 users here? I've always wanted to update our configs, based on the sites hosted on our servers. In general we don't create tons of prod sites, so it's not an issue, but we have a few clients who create their own sites and we want to monitor them. |
| [03:35:29] | <jonpugh[m]> | @bgm:matrix.org: yup, but only use it for server level stats gathering |
| [03:35:56] | <jonpugh[m]> | Grafana might be more what you need |
| [03:36:32] | <jonpugh[m]> | Its open source and has paid services you can plug into. We use worldping to monitor sites from outside |
| [03:36:51] | <bgm[m]> | I guess it's mostly to monitor cron runs, especially CiviCRM's cron (sending mailings, which is time-sensitive). We also use Grafana, and push metrics from Icinga2 to Grafana. |
| [03:37:08] | <jonpugh[m]> | https://worldping.raintank.io/ |
| [03:37:15] | <jonpugh[m]> | Try worldping then |
| [03:37:20] | <jonpugh[m]> | And statds |
| [03:38:25] | <jonpugh[m]> | Having trouble using Icinga for cron logging? |
| [03:38:41] | <bgm[m]> | ACTION uploaded an image: Capture d’écran de 2020-01-17 11-38-25.png (41KB) < https://matrix.org/_matrix/media/r0/download/matrix.org/ivblfkhXnFwuCOIF... > |
| [03:39:11] | <bgm[m]> | we have a variant of BOA's stats script, which generates a CSV usage file, which then gets picked-up by icinga2 (because I'm lazy). I love grafana :) |
| [03:39:34] | <bgm[m]> | (this is from one of the civicrm.org "spark" servers, which is a kind of low-cost entry-level service for people who want to try CiviCRM) |
| [03:41:45] | <bgm[m]> | I mean, for stat collection, Aegir already generates a daily file that we can collect. but for monitoring crons and health-check, I'd need a site-inventory file somewhere, and need to update icinga2's config from time to time. I can do it with ansible, just curious what other folks do. |
| [04:57:04] | <colan[m]> | bgm: we've been planning to set up [Prometheus](https://en.wikipedia.org/wiki/Prometheus_(software)), but haven't got to it yet. |
| [04:57:43] | <colan[m]> | seems to be what the next generation of icinga kids are using these days. ;) |
| [04:59:34] | <colan[m]> | and then do some kinda ELK type thing, but there's another variation of that acronym that's better now, just can't remember. |
| [04:59:43] | <colan[m]> | ELG? |
| [04:59:45] | <colan[m]> | ACTION shrugs |
| [05:00:11] | <bgm[m]> | haha, yeah, Prometheus probably has a larger base, especially in the k8s community |
| [05:00:52] | <bgm[m]> | I killed my ELK server. It's a pain. I'm testing loki/promtail for basic logs, and other things for event monitoring. |
| [05:01:11] | <bgm[m]> | (well, incident monitoring, i.e. crash reports) |
| [05:02:18] | <bgm[m]> | I don't know anyone running it unless they're spending 100k$/month on hosting and full infra staff ;) |
| [05:02:33] | <colan[m]> | good to know! |
| [05:04:20] | <jonpugh[m]> | I had the same experience. I tried getting prometheus running. Wasn't able to do it. |
| [05:04:49] | <jonpugh[m]> | bgm: We use this role to setup icinga automatically on client machines: https://github.com/tenequm/ansible-icinga-director-client |
| [05:11:42] | <jonpugh[m]> | Hah, I just noticed I do still launch a prometheus container in our stack, just don't use it because I couldn't get it working. |
| [05:12:08] | <jonpugh[m]> | bgm: Check this out: https://gist.github.com/jonpugh/a8bbe0425b78c8a93993c0f026beb554 |
| [05:12:17] | <jonpugh[m]> | That docker-compose stack has everything you need for a sweet monitoring setup. |
| [05:12:44] | <jonpugh[m]> | I would consider building your cron monitoring feature into the sites using statsd module |
| [05:12:47] | <bgm[m]> | to clarify: I was ranting at ELK, not Prometheus. I'm not too familiar with its use outside k8s. |
| [05:13:04] | <jonpugh[m]> | never got that one working either... |
| [05:13:09] | <bgm[m]> | :) |
| [05:13:16] | <jonpugh[m]> | Not to say they aren't awesome looking tools |
| [05:13:21] | <jonpugh[m]> | it just took more time |
| [05:15:04] | <bgm[m]> | <jonpugh[m] "bgm: We use this role to setup i"> ah good point about icinga-director. I have to admit I'm not yet using it (I used satellites before director, now need to migrate, but didn't have a strong incentive til now) |
| [05:16:43] | <bgm[m]> | And good points. I should probably focus on having the site-metrics in Grafana directly, instead of having a per-site service with Icinga. (and Grafana can do the alerting) |
| [05:26:19] | <jonpugh[m]> | That way you don't need an inventory. Each site starts sending stats with the URL attached |
| [05:27:18] | <jonpugh[m]> | With the director role, the ansible variables for icinga host and password, the statsd.module in the install profile, and variable_set for statsd config, it's fully auto |
| [05:27:56] | <jonpugh[m]> | create server > few minutes later data in grafana shows up. |
| [05:28:21] | <jonpugh[m]> | The icinga director role auto-adds the host to icinga |
| [05:56:43] | <bgm[m]> | yep, ver neat |
| [05:56:50] | <bgm[m]> | * yep, very neat |
| [05:56:55] | <bgm[m]> | thanks! |