Skip to content

Add updates after 2026-05 Osprey LLNL training#88

Open
waynelewis wants to merge 44 commits into
epics-training:mainfrom
osprey-dcs:2026-osprey-training
Open

Add updates after 2026-05 Osprey LLNL training#88
waynelewis wants to merge 44 commits into
epics-training:mainfrom
osprey-dcs:2026-osprey-training

Conversation

@waynelewis

Copy link
Copy Markdown

Changes

  • update and add package names for Rocky
  • add mysql Ansible collection
  • add fix for Rocky mysql socket path
  • add fix for archiver initialization script
  • brute force permissions for archappl user in mariadb
  • add extra pattern for Firefox default profile
  • disable firewalld and selinux
  • allow for non-existent vagrant user (if VM created manually)
  • skip disk expansion task (I could never make this work)
  • add procServ, socat package (to support installing and running IOC as systemd service)
  • add role to install/build/setup service for IOCs
  • add Archiver URL to Phoebus preferences (and some other sensible defaults)
  • changed ansible/roles/vars/local.yml to a symlink to avoid git pull issues with this being changed
  • added comments and notes for future changes (see TODO comments)

WIP

  • channelfinder role - to be revisited later

@ralphlange

Copy link
Copy Markdown
Contributor

Thanks, Wayne, for this large mixed bag of changes.

At first glance:
Some of them seem reasonable and fine, some of them I don't understand at all, some of them fix things that have been fixed on 'main', and some add things I had on my list for a while.

So I don't intend to merge this as is, but use it as a source to pick things into separate PRs.

How was your setup to develop this? Host OS? Architecture? Vagrant? VBox version? Target flavor?
Any tests other than the setup you used in the training?

@waynelewis

Copy link
Copy Markdown
Author

Thanks Ralph.

Agreed that this PR should not be merged as is. I used it as a way to communicate the changes we'd made in the preparation for and execution of the recent training.

If there are specific parts that you want me to separate out into individual PRs, let me know.

For the development of this, we used the following:

  • Host OS: Windows, Linux, MacOS (both x86_64 and aarch64 using x86_64 emulation)
  • VM architecture: x86_64
  • Vagrant: no, did not have success with getting this to work
  • VirtualBox version: 7.2.8
  • Target OS distro: Rocky 9.7
  • Other testing: no

@ralphlange

Copy link
Copy Markdown
Contributor

I will do the cherry-picking, no worries.

A different request, though: Your channelfinder task in the PR is installing the archiver appliance - do you have the code for installing channelfinder, or was that your break-off point? If you have the code, I'd be interested.

I never got Vagrant working for VBox > 7.1.6 (see #69, #82). I should give it another try. But I don't have admin rights on my laptop - hard to test, as I have to apply to be granted "admin for a day" (needs justification and approval).
I also don't have a Linux host system that I could run VBox on. All Linux boxes I have are VMs or dev containers on VMs ... not good for Vagrant testing.

Arm still has a lot of issues, both on the host Vagrant/VBox side as on the guest system side - if you look at the issue tickets, almost half of them are about aarch64.
My problem: I don't have an arm-based machine, and I don't know any local person
who has one. My testing capabilities are literally zero.
If you could be one of the once-in-a-while aarch64 testers, that would be very helpful.

Do you know anyone with Windows11 on arm?

@waynelewis

Copy link
Copy Markdown
Author

A different request, though: Your channelfinder task in the PR is installing the archiver appliance - do you have the code for installing channelfinder, or was that your break-off point? If you have the code, I'd be interested.

We did not get the channelfinder install working. We only got to the elasticsearch container part, and that created enough issues and consumed enought time that we stopped. The channelfinder task in the PR was a copy of the archiver appliance task as a template.

Channelfinder is on our list of things to work on. Current thinking is to try to do it via a native install rather than using containers. But we haven't actively restarted that work yet.

I never got Vagrant working for VBox > 7.1.6 (see #69, #82). I should give it another try. But I don't have admin rights on my laptop - hard to test, as I have to apply to be granted "admin for a day" (needs justification and approval). I also don't have a Linux host system that I could run VBox on. All Linux boxes I have are VMs or dev containers on VMs ... not good for Vagrant testing.

That's good to know that we weren't alone in having issues with using Vagrant to create the VM.

Arm still has a lot of issues, both on the host Vagrant/VBox side as on the guest system side - if you look at the issue tickets, almost half of them are about aarch64. My problem: I don't have an arm-based machine, and I don't know any local person who has one. My testing capabilities are literally zero. If you could be one of the once-in-a-while aarch64 testers, that would be very helpful.

I can help with the aarch64 testing. I have MacOS on an M4 chip.

At the LLNL training, we had some people setting up the VM on MacOS. We are following up with them to get their modifications to the installation process. I had also tried using UTM to emulate an x86_64 architecture for a VM. That worked reasonably well except for issues with the Java compiler segfaulting when building Phoebus. In discussions with Michael, he suspected RAM issues. But we didn't test further, again due to time constraints with the upcoming training. I would prefer to just have it work with the native architecture.

Do you know anyone with Windows11 on arm?

No, sorry.

@mdavidsaver

Copy link
Copy Markdown

I will do the cherry-picking, no worries.

@ralphlange Will you update on the status of this cherry picking as you see it?

eg. It does not appear to me that as of 2adb314 enough has been picked for AA to correctly persist configuration to the local mariadb instance. Specifically 3cc9f86 to correctly configure mariadb and load the schema. Without this, AA will appear to work, allowing PVs to be added. However, all configuration is lost on reboot.

    - name: ensure mariadb privileges are set
      become: true
      ansible.builtin.shell:
        cmd: "mariadb --execute \"GRANT ALL ON archappl.* TO 'archappl' identified by 'archappl';\""

@waynelewis and I are not clear on why the community.mysql.mysql_user task does not actually create/grant a archappl user, but the result is clear.

To see this effect run:

mariadb --user=archappl --password=archappl --database=archappl

and see a permission error. Or look for the java exceptions in the (voluminous) tomcat log.

fyi. inspect the database state with sudo mariadb-dump archappl. If you don't see any CREATE TABLE, then the schema has not been loaded.

Alternately, I would suggest a switch AA to use sqlite to avoid the complexity and resource use of mariadb.

This was referenced Jun 10, 2026
@ralphlange

Copy link
Copy Markdown
Contributor

I will do the cherry-picking, no worries.

@ralphlange Will you update on the status of this cherry picking as you see it?

I was doing that already as comments in the 'changes'. Do you want it done differently?

@shroffk

shroffk commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

When we say ChannelFinder are we talking about the whole stack

ChannelFinder + recciever ( twisted )

I have ansible roles for these but I think we discussed that deploying it on the trainign VM would put a lot of strain and that we should have another VM for services

@ralphlange ralphlange mentioned this pull request Jun 11, 2026
@ralphlange

Copy link
Copy Markdown
Contributor

I know that was our discussed approach, but this PR ignores these old discussions and tries to do everything on the VM.

I actually like that (back then and now), that's why I was going forward with the AA role. (Wasting a lot of time because I don't know how the AA works.)

@shroffk

shroffk commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Ok,
I would then recommend we use the new CF and new recciever. I can try to sync up my work on those roles

@mdavidsaver

Copy link
Copy Markdown

I was doing that already as comments in the 'changes'. Do you want it done differently?

I am mostly looking for a "done" vs. "in progress". Your TODO list is (rightly) not captured in commit messages. Your burst of commits two weeks ago say "done" to me, thus my poke about the AA installation still being incomplete. Also, this could be a trigger for @waynelewis to take the time to rebase. However, I don't want to nag if you are planning to come back to pick more changes.

eg. Should I make a separate issue about issues with the AA install? Plans for CF?

@ralphlange

Copy link
Copy Markdown
Contributor
image

I thought these should be clear enough. I can add "done" if you prefer.

@mdavidsaver

Copy link
Copy Markdown

Your screen capture shows "Pending". That means you have to finish your review before the rest of us get to see your comment ;)

@ralphlange

Copy link
Copy Markdown
Contributor

I had no idea.

@ralphlange ralphlange left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say there are maybe a dozen different things that have been changed.
The changes show about 35 diffs.
Happening in 44 commits.

Comment thread initial_setup.sh
fi

cd /ansible
cd ansible

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand.

This script is intended to run the initial_setup role in a Vagrant-provided VM that has the ansible folder mounted at /ansible. In this setup, this change breaks the expected behavior.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was running this in a non-vagrant VM. If we're expecting /ansible to be a symlink(?) to the ansible folder, then maybe that needs to be part of either bootstrap.sh or the setup instructions.

Comment thread initial_setup.sh Outdated
cd ansible
ansible-galaxy install -r requirements.yml || true
ansible-playbook $ansible_args playbook.yml -e @vars/local.yml -e initial_setup=true
ansible-playbook $ansible_args playbook.yml -e @vars/local.yml -e initial_setup=false --ask-become-pass

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand.

This script is intended to fully automated run the initial_setup role in a Vagrant-provided VM that has the ansible folder mounted at /ansible. In this setup, this change breaks the expected behavior.

Why do you want to require a password?
Which password? The vagrant user's well-known password? The epics-dev user has no password.

tomcat_group: tomcat

# TODO: Update AA version to avoid bug with incorrect timestamp
# values in queries.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a reference for this bug?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is this: archiver-appliance/epicsarchiverap#324

@mdavidsaver , does that look right?

- "/home/{{ firefox_user }}/.mozilla/firefox/"
- "/home/{{ firefox_user }}/snap/firefox/common/.mozilla/firefox/"
patterns: "*.default,*.default-release" # Common patterns for default profiles
patterns: "*.default,*.default-release,*.default-default" # Common patterns for default profiles

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to 'main' as of 8f59ca0

- name: Set MariaDB socket path
ansible.builtin.set_fact:
mariadb_socket: "{{ '/var/run/mysqld/mysqld.sock' if is_debian else '/var/lib/mysql/mysql.sock' }}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 'main' as per 7cc6614

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea of stuff under 'epics-training' depending on private repos that may change or disappear without notice.
Needs to be discussed.

Comment thread ansible/vars/local.yml

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand.

This is for development only. For any "real" installation, the bootstrap script replaces the file with that soft link.
(see

training-vm/bootstrap.sh

Lines 119 to 136 in 2adb314

# Set up local.yml configuration
if [ -e "vm-setup/ansible/vars/local.yml" ]; then
if [ -h "vm-setup/ansible/vars/local.yml" ]; then
echo -n "Removing existing local configuration link"
echo " vm-setup/ansible/vars/local.yml"
rm -f vm-setup/ansible/vars/local.yml
else
echo -n "Moving existing local configuration file"
echo " vm-setup/ansible/vars/local.yml -> local.yml.bak"
mv -f vm-setup/ansible/vars/local.yml vm-setup/ansible/vars/local.yml.bak
fi
fi
if [ ! -e "local.yml" ]; then
echo "No local configuration found. Creating one from local.yml.sample"
cp "vm-setup/ansible/vars/local.yml.sample" "local.yml"
else
ln -s "../../../local.yml" "vm-setup/ansible/vars/local.yml"
fi
)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that bootstrap.sh replaced the file with a link made the git repo see a changed file and refuse to pull any subsequent changes as it would overwrite the change.

Comment thread bootstrap.sh
COLLECTION=${COLLECTION:-"training"}
SLUGFILE=${SLUGFILE:-"/etc/epics-training"}
COLLECTION_REPO=${COLLECTION_REPO:-"https://github.com/epics-training/training-collection"}
COLLECTION_REPO=${COLLECTION_REPO:-"https://github.com/osprey-dcs/training-collection"}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.
You can set it in the environment or through the seed functionality. No need to change the fallback.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I don't yet understand the alternative ways of setting the environment up.

Comment thread bootstrap.sh
fi
if ! command -v ansible >/dev/null; then
packages="${packages} ansible"
packages="${packages} ansible-core"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks flavor-dependent?!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the Ansible documentation (https://docs.ansible.com/projects/ansible/latest/installation_guide/installation_distros.html) the ansible package command should work. I don't know why it didn't. I'll recheck this when I build up a new VM image.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Debian 13, both ansible and ansible-core packages exist. ansible (210 MB) is described as This package contains the ansible collections., and depends on ansible-core (8 MB) which This package contains the ansible binaries..

Rocky 9 has only ansible-core (2 MB). RH9 EPEL has ansible (34 MB). EPEL also has several ansible-collection-community-* packages which are not pulled in by either base package.

@waynelewis I am now wondering if this is related to the mysterious lack of failure when the mariadb configuration was not created. How does ansible react when given a non-existant role name?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Channelfinder is not the only service using Elasticsearch, I would suggest making it its own role.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants