Skip to content

Improve pihole-FTL process concurrency#2025

Merged
DL6ER merged 5 commits into
development-v6from
tweak/multi-FTL
Aug 10, 2024
Merged

Improve pihole-FTL process concurrency#2025
DL6ER merged 5 commits into
development-v6from
tweak/multi-FTL

Conversation

@DL6ER

@DL6ER DL6ER commented Aug 6, 2024

Copy link
Copy Markdown
Member

What does this implement/fix?

FTL is trying hard to prevent you from starting another instance of itself. It does so by creating a PID file in /run/pihole-FTL.pid and checking for its existence. If the file is present, FTL assumes that another instance is already running and exits. It also checks for the existence of the shared memory objects /dev/shm/FTL-* and exits if they are present.

These checks can be fooled by manually removing the PID file or shared memory objects (needs root privileges). This can lead to multiple instances of FTL running at the same time, which can cause various issues. The most commonly seen issue is that, when the first process needs more memory, its original shared memory objects are already gone. Hence, it actually tries to resize the shared memory objects of the second instance which, on the other hand, doesn't expect this and crashes. As the first process was not able to allocate more memory, this process will eventually crash as well.

This commit resolves this by:

  1. Improve support for concurrent pihole-FTL instances
    Even when this continues to be somewhat discouraged, it is now possible to run multiple instances of pihole-FTL at the same time. This is achieved by ensuring that we keep the shared memory object file descriptors open as long as the process is running. This way, the shared memory objects are not removed until the process exits. This also means that the shared memory objects are not hard removed when they are deleted from the outside of the process (in Linux, a file continues to exist as long as there is at least one open file descriptor pointing to it).
    A hypothetical second instance of pihole-FTL will now be able to create new shared memory objects without interfering with the first instance. This is a more robust solution than the previous one, as it doesn't rely on the existence of the PID file or shared memory objects which might have been removed by external influences.
  2. Check for a potential second instance earlier in the code
    We move the check to before even trying to create shared memory objects. This way, we can exit early if we detect that another instance is already running. This is a more efficient solution than the previous one, as we don't need to create shared memory objects just to find out that we can't use them.
  3. Add an exclusive lock on the shared memory objects
    Even when this is not strictly necessary, it is a good practice to prevent other processes from interfering with the shared memory objects. This is especially important on systems which isolate processes from each other and is a safety net in rare cases such as multiple Docker containers (isolating processes from one another by default) with (incorrectly!) host-mounted shared memory folders.
    Once they remove the mounting of dev/shm, they will again be able to run multiple pihole-FTL across multiple containers.

Related issue or feature (if applicable): N/A

Pull request in docs with documentation (if applicable): N/A


By submitting this pull request, I confirm the following:

  1. I have read and understood the contributors guide, as well as this entire template. I understand which branch to base my commits and Pull Requests against.
  2. I have commented my proposed changes within the code.
  3. I am willing to help maintain this change if there are issues with it later.
  4. It is compatible with the EUPL 1.2 license
  5. I have squashed any insignificant commits. (git rebase)

Checklist:

  • The code change is tested and works locally.
  • I based my code and PRs against the repositories developmental branch.
  • I signed off all commits. Pi-hole enforces the DCO for all contributions
  • I signed all my commits. Pi-hole requires signatures to verify authorship
  • I have read the above and my PR is ready for review.

FTL is trying hard to prevent you from starting another instance of itself. It does so by creating a PID file in `/run/pihole-FTL.pid` and checking for its existence. If the file is present, FTL assumes that another instance is already running and exits. It also checks for the existence of the shared memory objects `/dev/shm/FTL-*` and exits if they are present. These checks can be fooled by manually removing the PID file or shared memory objects (needs `root` privileges). This can lead to multiple instances of FTL running at the same time, which can cause various issues. The most commonly seen issue is that, when the first process needs more memory, its original shared memory objects are already gone. Hence, it actually tries to resize the shared memory objects of the second instance which, on the other hand, doesn't expect this and crashes. As the first process was not able to allocate more memory, this process will eventually crash as well.

This commit resolves this by:
1. Improve support for concurrent `pihole-FTL` instances. Even when this continues to be somewhat discouraged, it is now possible to run multiple instances of `pihole-FTL` at the same time. This is achieved by ensuring that we keep the shared memory object file descriptors open as long as the process is running. This way, the shared memory objects are not removed until the process exits. This also means that the shared memory objects are not *hard* removed when they are deleted from the outside of the process (in Linux, a file continues to exist as long as there is at least one open file descriptor pointing to it). A hypothetical second instance of `pihole-FTL` will now be able to create new shared memory objects without interfering with the first instance. This is a more robust solution than the previous one, as it doesn't rely on the existence of the PID file or shared memory objects which might have been removed by external influences.
2. Check for a potential second instance earlier in the code. We move the check to before even trying to create shared memory objects. This way, we can exit early if we detect that another instance is already running. This is a more efficient solution than the previous one, as we don't need to create shared memory objects just to find out that we can't use them.
3. Add an exclusive lock on the shared memory objects. Even when this is not strictly necessary, it is a good practice to prevent other processes from interfering with the shared memory objects. This is especially important on systems which isolate processes from each other and is a safety net in rare cases such aas multiple Docker containers (isolating processes from one another by default) with (incorrectly!) host-mounted shared memory folders. Once they remove the mounting of `dev/shm`, they will again be able to run multiple `pihole-FTL` across multiple containers.

Signed-off-by: DL6ER <dl6er@dl6er.de>
Comment thread test/test_suite.bats
Comment on lines -1173 to -1180
@test "No CRIT messages in FTL.log (besides error due to testing to start FTL more than once)" {
run bash -c 'grep "CRIT: " /var/log/pihole/FTL.log'
printf "%s\n" "${lines[@]}"
run bash -c 'grep "CRIT: " /var/log/pihole/FTL.log | grep -c -v "Initialization of shared memory failed."'
printf "count: %s\n" "${lines[@]}"
[[ ${lines[0]} == "0" ]]
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was duplicated

Comment thread test/test_suite.bats
Comment on lines -1165 to -1171
@test "No ERROR messages in FTL.log (besides known/intended error)" {
run bash -c 'grep "ERROR: " /var/log/pihole/FTL.log'
printf "%s\n" "${lines[@]}"
run bash -c 'grep "ERROR: " /var/log/pihole/FTL.log | grep -c -v -E "(index\.html)|(Failed to create shared memory object)|(FTLCONF_debug_api is invalid)|(Failed to set|adjust time during NTP sync: Insufficient permissions)"'
printf "count: %s\n" "${lines[@]}"
[[ ${lines[0]} == "0" ]]
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was moved up to the other log file tests without changes

Comment thread src/shmem.c
Comment on lines -152 to -186
// Verify the PID stored during shared memory initialization is the same as ours
// (while we initialized the shared memory objects)
static void verify_shmem_pid(void)
{
// Open shared memory settings object
const int settingsfd = shm_open(SHARED_SETTINGS_NAME, O_RDONLY, S_IRUSR | S_IWUSR);
if(settingsfd == -1)
{
log_crit("verify_shmem_pid(): Failed to open shared memory object \"%s\": %s",
SHARED_SETTINGS_NAME, strerror(errno));
exit(EXIT_FAILURE);
}

ShmSettings shms = { 0 };
if(read(settingsfd, &shms, sizeof(shms)) != sizeof(shms))
{
log_crit("verify_shmem_pid(): Failed to read %zu bytes from shared memory object \"%s\": %s",
sizeof(shms), SHARED_SETTINGS_NAME, strerror(errno));
exit(EXIT_FAILURE);
}

close(settingsfd);

// Compare the SHM's PID to the one we had when creating the SHM objects
if(shms.pid == shmem_pid)
return;

// If we reach here, we are in serious trouble. Terminating with error
// code is the most sensible thing we can do at this point
log_crit("Shared memory is owned by a different process (PID %d)", shms.pid);
check_running_FTL();
log_crit("Exiting now!");
exit(EXIT_FAILURE);
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed any longer. The helt-open file descriptors paired with the new exclusive lock effectively prevents other processes from being able to "steal" the shared memory objects. In fact, it doesn't even work any longer as multiple pihole-FTL instances running in parallel is not (theoretically) possible.

Comment thread src/main.c
{
log_crit("Initialization of shared memory failed.");
// Check if there is already a running FTL process
check_running_FTL();

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check is moved up (and renamed to another_FTL()=

@yubiuser yubiuser left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, before the second instance crashed (not intended) but this prevented various bugs. Now it won't crash and two concurrent FTL instances will run - doesn't this allow for the bugs mentioned?

@DL6ER

DL6ER commented Aug 6, 2024

Copy link
Copy Markdown
Member Author

Starting two concurrent processes is prevented by several methods.

Circumvention is hard as you need to

  • delete all shared memory files,
  • delete the PID file, and
  • configure your system such that processes cannot see each other.

Even on this unlikely case, the second instance will not do much as the DNS port will already be taken by the other instance effectively preventing the process to complete startup.

@pralor-bot

Copy link
Copy Markdown

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/pihole-6-0-beta-and-sqlite-issue/71586/11

@rdwebdesign

Copy link
Copy Markdown
Member

effectively preventing the process to complete startup

The second instance will almost be like a zombie-FTL.

@DL6ER

DL6ER commented Aug 7, 2024

Copy link
Copy Markdown
Member Author

Not really a zombie as it will still be reacting as excepted to SIGTERM and friends.

@yubiuser yubiuser left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the error message could be a different one when trying to start a second instance (e.g. "One FTL process already running")

  [✓] Restarting pihole-FTL service...
  [✓] Enabling pihole-FTL service to start on reboot...
nanopi@nanopi:~$ pihole-FTL 
ERROR: Opening of FTL log (/var/log/pihole/FTL.log) failed!

@DL6ER

DL6ER commented Aug 7, 2024

Copy link
Copy Markdown
Member Author

@yubiuser what you observe is the real error in this case - FTL didn't even get so far to check for a second process. Inability to write to the defined log file (because of permissions in your case) is a hard-fail.

@yubiuser yubiuser left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition is good, but I think it's not enough. I checked the syslog and nowhere we log that an other FTL process is already running.

nanopi@nanopi:~$ pihole-FTL 
ERROR: Opening of FTL log (/var/log/pihole/FTL.log) failed: Permission denied
Using syslog instead!

2024-08-08T22:38:50.207946+02:00 nanopi : Opening of FTL's log file failed, using syslog instead!
2024-08-08T22:38:50.210143+02:00 nanopi : ########## FTL started on nanopi! ##########
2024-08-08T22:38:50.211358+02:00 nanopi : FTL branch: tweak/multi-FTL
2024-08-08T22:38:50.212122+02:00 nanopi : FTL version: vDev-5659393
2024-08-08T22:38:50.212751+02:00 nanopi : FTL commit: 56593936
2024-08-08T22:38:50.213482+02:00 nanopi : FTL date: 2024-08-07 22:12:32 +0200
2024-08-08T22:38:50.214267+02:00 nanopi : FTL user: nanopi
2024-08-08T22:38:50.215063+02:00 nanopi : Compiled for linux/arm64/v8 (compiled on CI) using cc (Alpine 13.2.1_git20240309) 13.2.1 20240309


``

…t to NULL due to permissions issues during startup

Signed-off-by: DL6ER <dl6er@dl6er.de>
@DL6ER

DL6ER commented Aug 9, 2024

Copy link
Copy Markdown
Member Author

Thanks. The issue here was that we noticed FTL.log being inaccessible during startup and start logging to syslog. Then, FTL reads pihole-FTL.toml, found a valid path and and started trying to log to the log file again (i.e. stopped logging to syslog because it thought it has a valid log file as target instead). As this file is not writable, the logging eventually went into the nowhere. FTL correctly detected that it was started multiple times and shut down but wasn't able to tell you about it. Fixed now.

edit The last commit improves this by instead of deciding once if we log to the log file always checking if we can write and - if not - go to syslog instead.

@yubiuser yubiuser left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess that last two lines from syslog indicate that an other FTL process is running.

2024-08-10T07:40:05.372301+02:00 nanopi : Opening of FTL's log file failed, using syslog instead!
2024-08-10T07:40:05.374213+02:00 nanopi : ########## FTL started on nanopi! ##########
2024-08-10T07:40:05.375454+02:00 nanopi : FTL branch: tweak/multi-FTL
2024-08-10T07:40:05.376521+02:00 nanopi : FTL version: vDev-72e2a0b
2024-08-10T07:40:05.378928+02:00 nanopi : FTL commit: 72e2a0b0
2024-08-10T07:40:05.381139+02:00 nanopi : FTL date: 2024-08-10 06:57:43 +0200
2024-08-10T07:40:05.384130+02:00 nanopi : FTL user: nanopi
2024-08-10T07:40:05.386708+02:00 nanopi : Compiled for linux/arm64/v8 (compiled on CI) using cc (Alpine 13.2.1_git20240309) 13.2.1 20240309
2024-08-10T07:40:05.394169+02:00 nanopi : Config file /etc/pihole/pihole.toml.tmp not available: Permission denied
2024-08-10T07:40:05.395668+02:00 nanopi : Cannot write to FTL config file (Permission denied), content not updated
2024-08-10T07:40:05.396796+02:00 nanopi : Cannot open /etc/pihole/dnsmasq.conf.temp for writing, unable to update dnsmasq configuration: Permission denied
2024-08-10T07:40:05.401147+02:00 nanopi : Cannot open /etc/pihole/custom.list.tmp for writing, unable to update custom.list: Permission denied
2024-08-10T07:40:05.404697+02:00 nanopi : Parsed config file /etc/pihole/pihole.toml successfully
2024-08-10T07:40:05.430858+02:00 nanopi : pihole-FTL is already running!
2024-08-10T07:40:05.432098+02:00 nanopi : systemd (PID 1) ──> pihole-FTL (PID 154573, started 2024-08-10 07:36:44 CEST)
2024-08-10T07:40:05.432977+02:00 nanopi :                          └─> pihole-FTL (PID 154695, started 2024-08-10 07:37:49 CEST)

@DL6ER

DL6ER commented Aug 10, 2024

Copy link
Copy Markdown
Member Author

Sure, it explicitly said

pihole-FTL is already running!

How could it be said better? Showing the process list at the end is there to help identify which pihole-FTL is already running.

@DL6ER DL6ER merged commit 467f5b5 into development-v6 Aug 10, 2024
@DL6ER DL6ER deleted the tweak/multi-FTL branch August 10, 2024 06:20
@DL6ER DL6ER mentioned this pull request Aug 22, 2024
5 tasks
@PromoFaux PromoFaux mentioned this pull request Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants