Add updates after 2026-05 Osprey LLNL training#88
Conversation
|
Thanks, Wayne, for this large mixed bag of changes. At first glance: So I don't intend to merge this as is, but use it as a source to pick things into separate PRs. How was your setup to develop this? Host OS? Architecture? Vagrant? VBox version? Target flavor? |
|
Thanks Ralph. Agreed that this PR should not be merged as is. I used it as a way to communicate the changes we'd made in the preparation for and execution of the recent training. If there are specific parts that you want me to separate out into individual PRs, let me know. For the development of this, we used the following:
|
|
I will do the cherry-picking, no worries. A different request, though: Your channelfinder task in the PR is installing the archiver appliance - do you have the code for installing channelfinder, or was that your break-off point? If you have the code, I'd be interested. I never got Vagrant working for VBox > 7.1.6 (see #69, #82). I should give it another try. But I don't have admin rights on my laptop - hard to test, as I have to apply to be granted "admin for a day" (needs justification and approval). Arm still has a lot of issues, both on the host Vagrant/VBox side as on the guest system side - if you look at the issue tickets, almost half of them are about aarch64. Do you know anyone with Windows11 on arm? |
We did not get the channelfinder install working. We only got to the elasticsearch container part, and that created enough issues and consumed enought time that we stopped. The channelfinder task in the PR was a copy of the archiver appliance task as a template. Channelfinder is on our list of things to work on. Current thinking is to try to do it via a native install rather than using containers. But we haven't actively restarted that work yet.
That's good to know that we weren't alone in having issues with using Vagrant to create the VM.
I can help with the aarch64 testing. I have MacOS on an M4 chip. At the LLNL training, we had some people setting up the VM on MacOS. We are following up with them to get their modifications to the installation process. I had also tried using UTM to emulate an x86_64 architecture for a VM. That worked reasonably well except for issues with the Java compiler segfaulting when building Phoebus. In discussions with Michael, he suspected RAM issues. But we didn't test further, again due to time constraints with the upcoming training. I would prefer to just have it work with the native architecture.
No, sorry. |
@ralphlange Will you update on the status of this cherry picking as you see it? eg. It does not appear to me that as of 2adb314 enough has been picked for AA to correctly persist configuration to the local mariadb instance. Specifically 3cc9f86 to correctly configure mariadb and load the schema. Without this, AA will appear to work, allowing PVs to be added. However, all configuration is lost on reboot. - name: ensure mariadb privileges are set
become: true
ansible.builtin.shell:
cmd: "mariadb --execute \"GRANT ALL ON archappl.* TO 'archappl' identified by 'archappl';\""@waynelewis and I are not clear on why the To see this effect run: mariadb --user=archappl --password=archappl --database=archappland see a permission error. Or look for the java exceptions in the (voluminous) tomcat log. fyi. inspect the database state with Alternately, I would suggest a switch AA to use sqlite to avoid the complexity and resource use of mariadb. |
I was doing that already as comments in the 'changes'. Do you want it done differently? |
|
When we say ChannelFinder are we talking about the whole stack ChannelFinder + recciever ( twisted ) I have ansible roles for these but I think we discussed that deploying it on the trainign VM would put a lot of strain and that we should have another VM for services |
|
I know that was our discussed approach, but this PR ignores these old discussions and tries to do everything on the VM. I actually like that (back then and now), that's why I was going forward with the AA role. (Wasting a lot of time because I don't know how the AA works.) |
|
Ok, |
I am mostly looking for a "done" vs. "in progress". Your TODO list is (rightly) not captured in commit messages. Your burst of commits two weeks ago say "done" to me, thus my poke about the AA installation still being incomplete. Also, this could be a trigger for @waynelewis to take the time to rebase. However, I don't want to nag if you are planning to come back to pick more changes. eg. Should I make a separate issue about issues with the AA install? Plans for CF? |
|
Your screen capture shows "Pending". That means you have to finish your review before the rest of us get to see your comment ;) |
|
I had no idea. |
ralphlange
left a comment
There was a problem hiding this comment.
I would say there are maybe a dozen different things that have been changed.
The changes show about 35 diffs.
Happening in 44 commits.
| fi | ||
|
|
||
| cd /ansible | ||
| cd ansible |
There was a problem hiding this comment.
I don't understand.
This script is intended to run the initial_setup role in a Vagrant-provided VM that has the ansible folder mounted at /ansible. In this setup, this change breaks the expected behavior.
There was a problem hiding this comment.
I was running this in a non-vagrant VM. If we're expecting /ansible to be a symlink(?) to the ansible folder, then maybe that needs to be part of either bootstrap.sh or the setup instructions.
| cd ansible | ||
| ansible-galaxy install -r requirements.yml || true | ||
| ansible-playbook $ansible_args playbook.yml -e @vars/local.yml -e initial_setup=true | ||
| ansible-playbook $ansible_args playbook.yml -e @vars/local.yml -e initial_setup=false --ask-become-pass |
There was a problem hiding this comment.
I don't understand.
This script is intended to fully automated run the initial_setup role in a Vagrant-provided VM that has the ansible folder mounted at /ansible. In this setup, this change breaks the expected behavior.
Why do you want to require a password?
Which password? The vagrant user's well-known password? The epics-dev user has no password.
| tomcat_group: tomcat | ||
|
|
||
| # TODO: Update AA version to avoid bug with incorrect timestamp | ||
| # values in queries. |
There was a problem hiding this comment.
Do you have a reference for this bug?
There was a problem hiding this comment.
I believe it is this: archiver-appliance/epicsarchiverap#324
@mdavidsaver , does that look right?
| - "/home/{{ firefox_user }}/.mozilla/firefox/" | ||
| - "/home/{{ firefox_user }}/snap/firefox/common/.mozilla/firefox/" | ||
| patterns: "*.default,*.default-release" # Common patterns for default profiles | ||
| patterns: "*.default,*.default-release,*.default-default" # Common patterns for default profiles |
| - name: Set MariaDB socket path | ||
| ansible.builtin.set_fact: | ||
| mariadb_socket: "{{ '/var/run/mysqld/mysqld.sock' if is_debian else '/var/lib/mysql/mysql.sock' }}" | ||
|
|
There was a problem hiding this comment.
I don't like the idea of stuff under 'epics-training' depending on private repos that may change or disappear without notice.
Needs to be discussed.
There was a problem hiding this comment.
I don't understand.
This is for development only. For any "real" installation, the bootstrap script replaces the file with that soft link.
(see
Lines 119 to 136 in 2adb314
There was a problem hiding this comment.
The fact that bootstrap.sh replaced the file with a link made the git repo see a changed file and refuse to pull any subsequent changes as it would overwrite the change.
| COLLECTION=${COLLECTION:-"training"} | ||
| SLUGFILE=${SLUGFILE:-"/etc/epics-training"} | ||
| COLLECTION_REPO=${COLLECTION_REPO:-"https://github.com/epics-training/training-collection"} | ||
| COLLECTION_REPO=${COLLECTION_REPO:-"https://github.com/osprey-dcs/training-collection"} |
There was a problem hiding this comment.
No.
You can set it in the environment or through the seed functionality. No need to change the fallback.
There was a problem hiding this comment.
OK. I don't yet understand the alternative ways of setting the environment up.
| fi | ||
| if ! command -v ansible >/dev/null; then | ||
| packages="${packages} ansible" | ||
| packages="${packages} ansible-core" |
There was a problem hiding this comment.
That looks flavor-dependent?!
There was a problem hiding this comment.
From the Ansible documentation (https://docs.ansible.com/projects/ansible/latest/installation_guide/installation_distros.html) the ansible package command should work. I don't know why it didn't. I'll recheck this when I build up a new VM image.
There was a problem hiding this comment.
With Debian 13, both ansible and ansible-core packages exist. ansible (210 MB) is described as This package contains the ansible collections., and depends on ansible-core (8 MB) which This package contains the ansible binaries..
Rocky 9 has only ansible-core (2 MB). RH9 EPEL has ansible (34 MB). EPEL also has several ansible-collection-community-* packages which are not pulled in by either base package.
@waynelewis I am now wondering if this is related to the mysterious lack of failure when the mariadb configuration was not created. How does ansible react when given a non-existant role name?
There was a problem hiding this comment.
As Channelfinder is not the only service using Elasticsearch, I would suggest making it its own role.
Changes
WIP