Improve pihole-FTL process concurrency#2025
Conversation
FTL is trying hard to prevent you from starting another instance of itself. It does so by creating a PID file in `/run/pihole-FTL.pid` and checking for its existence. If the file is present, FTL assumes that another instance is already running and exits. It also checks for the existence of the shared memory objects `/dev/shm/FTL-*` and exits if they are present. These checks can be fooled by manually removing the PID file or shared memory objects (needs `root` privileges). This can lead to multiple instances of FTL running at the same time, which can cause various issues. The most commonly seen issue is that, when the first process needs more memory, its original shared memory objects are already gone. Hence, it actually tries to resize the shared memory objects of the second instance which, on the other hand, doesn't expect this and crashes. As the first process was not able to allocate more memory, this process will eventually crash as well. This commit resolves this by: 1. Improve support for concurrent `pihole-FTL` instances. Even when this continues to be somewhat discouraged, it is now possible to run multiple instances of `pihole-FTL` at the same time. This is achieved by ensuring that we keep the shared memory object file descriptors open as long as the process is running. This way, the shared memory objects are not removed until the process exits. This also means that the shared memory objects are not *hard* removed when they are deleted from the outside of the process (in Linux, a file continues to exist as long as there is at least one open file descriptor pointing to it). A hypothetical second instance of `pihole-FTL` will now be able to create new shared memory objects without interfering with the first instance. This is a more robust solution than the previous one, as it doesn't rely on the existence of the PID file or shared memory objects which might have been removed by external influences. 2. Check for a potential second instance earlier in the code. We move the check to before even trying to create shared memory objects. This way, we can exit early if we detect that another instance is already running. This is a more efficient solution than the previous one, as we don't need to create shared memory objects just to find out that we can't use them. 3. Add an exclusive lock on the shared memory objects. Even when this is not strictly necessary, it is a good practice to prevent other processes from interfering with the shared memory objects. This is especially important on systems which isolate processes from each other and is a safety net in rare cases such aas multiple Docker containers (isolating processes from one another by default) with (incorrectly!) host-mounted shared memory folders. Once they remove the mounting of `dev/shm`, they will again be able to run multiple `pihole-FTL` across multiple containers. Signed-off-by: DL6ER <dl6er@dl6er.de>
| @test "No CRIT messages in FTL.log (besides error due to testing to start FTL more than once)" { | ||
| run bash -c 'grep "CRIT: " /var/log/pihole/FTL.log' | ||
| printf "%s\n" "${lines[@]}" | ||
| run bash -c 'grep "CRIT: " /var/log/pihole/FTL.log | grep -c -v "Initialization of shared memory failed."' | ||
| printf "count: %s\n" "${lines[@]}" | ||
| [[ ${lines[0]} == "0" ]] | ||
| } | ||
|
|
| @test "No ERROR messages in FTL.log (besides known/intended error)" { | ||
| run bash -c 'grep "ERROR: " /var/log/pihole/FTL.log' | ||
| printf "%s\n" "${lines[@]}" | ||
| run bash -c 'grep "ERROR: " /var/log/pihole/FTL.log | grep -c -v -E "(index\.html)|(Failed to create shared memory object)|(FTLCONF_debug_api is invalid)|(Failed to set|adjust time during NTP sync: Insufficient permissions)"' | ||
| printf "count: %s\n" "${lines[@]}" | ||
| [[ ${lines[0]} == "0" ]] | ||
| } |
There was a problem hiding this comment.
This test was moved up to the other log file tests without changes
| // Verify the PID stored during shared memory initialization is the same as ours | ||
| // (while we initialized the shared memory objects) | ||
| static void verify_shmem_pid(void) | ||
| { | ||
| // Open shared memory settings object | ||
| const int settingsfd = shm_open(SHARED_SETTINGS_NAME, O_RDONLY, S_IRUSR | S_IWUSR); | ||
| if(settingsfd == -1) | ||
| { | ||
| log_crit("verify_shmem_pid(): Failed to open shared memory object \"%s\": %s", | ||
| SHARED_SETTINGS_NAME, strerror(errno)); | ||
| exit(EXIT_FAILURE); | ||
| } | ||
|
|
||
| ShmSettings shms = { 0 }; | ||
| if(read(settingsfd, &shms, sizeof(shms)) != sizeof(shms)) | ||
| { | ||
| log_crit("verify_shmem_pid(): Failed to read %zu bytes from shared memory object \"%s\": %s", | ||
| sizeof(shms), SHARED_SETTINGS_NAME, strerror(errno)); | ||
| exit(EXIT_FAILURE); | ||
| } | ||
|
|
||
| close(settingsfd); | ||
|
|
||
| // Compare the SHM's PID to the one we had when creating the SHM objects | ||
| if(shms.pid == shmem_pid) | ||
| return; | ||
|
|
||
| // If we reach here, we are in serious trouble. Terminating with error | ||
| // code is the most sensible thing we can do at this point | ||
| log_crit("Shared memory is owned by a different process (PID %d)", shms.pid); | ||
| check_running_FTL(); | ||
| log_crit("Exiting now!"); | ||
| exit(EXIT_FAILURE); | ||
| } | ||
|
|
There was a problem hiding this comment.
This is not needed any longer. The helt-open file descriptors paired with the new exclusive lock effectively prevents other processes from being able to "steal" the shared memory objects. In fact, it doesn't even work any longer as multiple pihole-FTL instances running in parallel is not (theoretically) possible.
| { | ||
| log_crit("Initialization of shared memory failed."); | ||
| // Check if there is already a running FTL process | ||
| check_running_FTL(); |
There was a problem hiding this comment.
Check is moved up (and renamed to another_FTL()=
yubiuser
left a comment
There was a problem hiding this comment.
So, before the second instance crashed (not intended) but this prevented various bugs. Now it won't crash and two concurrent FTL instances will run - doesn't this allow for the bugs mentioned?
|
Starting two concurrent processes is prevented by several methods. Circumvention is hard as you need to
Even on this unlikely case, the second instance will not do much as the DNS port will already be taken by the other instance effectively preventing the process to complete startup. |
|
This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there: https://discourse.pi-hole.net/t/pihole-6-0-beta-and-sqlite-issue/71586/11 |
The second instance will almost be like a zombie-FTL. |
|
Not really a zombie as it will still be reacting as excepted to SIGTERM and friends. |
yubiuser
left a comment
There was a problem hiding this comment.
Maybe the error message could be a different one when trying to start a second instance (e.g. "One FTL process already running")
[✓] Restarting pihole-FTL service...
[✓] Enabling pihole-FTL service to start on reboot...
nanopi@nanopi:~$ pihole-FTL
ERROR: Opening of FTL log (/var/log/pihole/FTL.log) failed!
|
@yubiuser what you observe is the real error in this case - FTL didn't even get so far to check for a second process. Inability to write to the defined log file (because of permissions in your case) is a hard-fail. |
Signed-off-by: DL6ER <dl6er@dl6er.de>
yubiuser
left a comment
There was a problem hiding this comment.
The addition is good, but I think it's not enough. I checked the syslog and nowhere we log that an other FTL process is already running.
nanopi@nanopi:~$ pihole-FTL
ERROR: Opening of FTL log (/var/log/pihole/FTL.log) failed: Permission denied
Using syslog instead!
2024-08-08T22:38:50.207946+02:00 nanopi : Opening of FTL's log file failed, using syslog instead!
2024-08-08T22:38:50.210143+02:00 nanopi : ########## FTL started on nanopi! ##########
2024-08-08T22:38:50.211358+02:00 nanopi : FTL branch: tweak/multi-FTL
2024-08-08T22:38:50.212122+02:00 nanopi : FTL version: vDev-5659393
2024-08-08T22:38:50.212751+02:00 nanopi : FTL commit: 56593936
2024-08-08T22:38:50.213482+02:00 nanopi : FTL date: 2024-08-07 22:12:32 +0200
2024-08-08T22:38:50.214267+02:00 nanopi : FTL user: nanopi
2024-08-08T22:38:50.215063+02:00 nanopi : Compiled for linux/arm64/v8 (compiled on CI) using cc (Alpine 13.2.1_git20240309) 13.2.1 20240309
``
…t to NULL due to permissions issues during startup Signed-off-by: DL6ER <dl6er@dl6er.de>
|
Thanks. The issue here was that we noticed FTL.log being inaccessible during startup and start logging to edit The last commit improves this by instead of deciding once if we log to the log file always checking if we can write and - if not - go to |
Signed-off-by: DL6ER <dl6er@dl6er.de>
…wise when trying to join afterwards (https://discourse.pi-hole.net/t/pihole-6-0-beta-and-sqlite-issue/71586/13) Signed-off-by: DL6ER <dl6er@dl6er.de>
yubiuser
left a comment
There was a problem hiding this comment.
So I guess that last two lines from syslog indicate that an other FTL process is running.
2024-08-10T07:40:05.372301+02:00 nanopi : Opening of FTL's log file failed, using syslog instead!
2024-08-10T07:40:05.374213+02:00 nanopi : ########## FTL started on nanopi! ##########
2024-08-10T07:40:05.375454+02:00 nanopi : FTL branch: tweak/multi-FTL
2024-08-10T07:40:05.376521+02:00 nanopi : FTL version: vDev-72e2a0b
2024-08-10T07:40:05.378928+02:00 nanopi : FTL commit: 72e2a0b0
2024-08-10T07:40:05.381139+02:00 nanopi : FTL date: 2024-08-10 06:57:43 +0200
2024-08-10T07:40:05.384130+02:00 nanopi : FTL user: nanopi
2024-08-10T07:40:05.386708+02:00 nanopi : Compiled for linux/arm64/v8 (compiled on CI) using cc (Alpine 13.2.1_git20240309) 13.2.1 20240309
2024-08-10T07:40:05.394169+02:00 nanopi : Config file /etc/pihole/pihole.toml.tmp not available: Permission denied
2024-08-10T07:40:05.395668+02:00 nanopi : Cannot write to FTL config file (Permission denied), content not updated
2024-08-10T07:40:05.396796+02:00 nanopi : Cannot open /etc/pihole/dnsmasq.conf.temp for writing, unable to update dnsmasq configuration: Permission denied
2024-08-10T07:40:05.401147+02:00 nanopi : Cannot open /etc/pihole/custom.list.tmp for writing, unable to update custom.list: Permission denied
2024-08-10T07:40:05.404697+02:00 nanopi : Parsed config file /etc/pihole/pihole.toml successfully
2024-08-10T07:40:05.430858+02:00 nanopi : pihole-FTL is already running!
2024-08-10T07:40:05.432098+02:00 nanopi : systemd (PID 1) ──> pihole-FTL (PID 154573, started 2024-08-10 07:36:44 CEST)
2024-08-10T07:40:05.432977+02:00 nanopi : └─> pihole-FTL (PID 154695, started 2024-08-10 07:37:49 CEST)
|
Sure, it explicitly said How could it be said better? Showing the process list at the end is there to help identify which |
What does this implement/fix?
FTL is trying hard to prevent you from starting another instance of itself. It does so by creating a PID file in
/run/pihole-FTL.pidand checking for its existence. If the file is present, FTL assumes that another instance is already running and exits. It also checks for the existence of the shared memory objects/dev/shm/FTL-*and exits if they are present.These checks can be fooled by manually removing the PID file or shared memory objects (needs
rootprivileges). This can lead to multiple instances of FTL running at the same time, which can cause various issues. The most commonly seen issue is that, when the first process needs more memory, its original shared memory objects are already gone. Hence, it actually tries to resize the shared memory objects of the second instance which, on the other hand, doesn't expect this and crashes. As the first process was not able to allocate more memory, this process will eventually crash as well.This commit resolves this by:
pihole-FTLinstancesEven when this continues to be somewhat discouraged, it is now possible to run multiple instances of
pihole-FTLat the same time. This is achieved by ensuring that we keep the shared memory object file descriptors open as long as the process is running. This way, the shared memory objects are not removed until the process exits. This also means that the shared memory objects are not hard removed when they are deleted from the outside of the process (in Linux, a file continues to exist as long as there is at least one open file descriptor pointing to it).A hypothetical second instance of
pihole-FTLwill now be able to create new shared memory objects without interfering with the first instance. This is a more robust solution than the previous one, as it doesn't rely on the existence of the PID file or shared memory objects which might have been removed by external influences.We move the check to before even trying to create shared memory objects. This way, we can exit early if we detect that another instance is already running. This is a more efficient solution than the previous one, as we don't need to create shared memory objects just to find out that we can't use them.
Even when this is not strictly necessary, it is a good practice to prevent other processes from interfering with the shared memory objects. This is especially important on systems which isolate processes from each other and is a safety net in rare cases such as multiple Docker containers (isolating processes from one another by default) with (incorrectly!) host-mounted shared memory folders.
Once they remove the mounting of
dev/shm, they will again be able to run multiplepihole-FTLacross multiple containers.Related issue or feature (if applicable): N/A
Pull request in docs with documentation (if applicable): N/A
By submitting this pull request, I confirm the following:
git rebase)Checklist:
developmentalbranch.