Skip to content

Conversation

@skshetry
Copy link
Contributor

@skshetry skshetry commented Dec 15, 2025

Closes #9460.

EDIT: This patch has been updated to use os.execvp on non-Windows systems. The discussion below is only applicable on Windows due to lack of POSIX exec*() functions.


This fix treats python -m lakefs ... as a convenient entry point, primarily intended for interactive use and limited workflows. So, this patch has two issues which I think is acceptable for that scenario:

  1. In an interactive terminal, the kernel’s tty driver delivers SIGINT to all foreground processes. As a result, pressing Ctrl + C already sends SIGINT to both the Python interpreter and the lakefs binary. However, subprocess.run() waits for a short period, 0.25 seconds by default, before forcefully terminating the child process with SIGKILL. So, this may not allow the lakefs binary to exit gracefully.
  2. If it is run in a non-interactive session, the lakefs binary never receives a signal, and it will get killed with a SIGKILL when the Python interpreter receives a SIGINT (after waiting for 0.25 second). The recommended way is to interrupt the binary rather than the lakefs module itself. So I think this is only a minor inconvenience.

Alternatively, we could SIG_IGN the interrupt, but wanted to propose something simpler.

Let me know if my understanding is incorrect.

try:
print(f'running {binary_path}...')
proc = subprocess.run(
[binary_path] + args, check=False, env=os.environ)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check=False is the default. Same with env.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proc = subprocess.run(
[binary_path] + args, check=False, env=os.environ)
return proc.returncode
except subprocess.CalledProcessError as e:
Copy link
Contributor Author

@skshetry skshetry Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not get raised when check=False.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Comment on lines -158 to -159
if not binary_path:
raise RuntimeError("binary not found")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this if condition from here and moved to find_or_download_binary, because the typehint says it cannot return None:

def find_or_download_binary(binary_name: str) -> str:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that the typing is wrong! find_or_download_binary may easily return null: it calls _find_binary and fails, then download_binaries does something wrong, and then _find_binary fails again. For instance, if I delete the downloaded lakefs right after you download the binary, then there is no binary to be found.

So please keep this check; consider correcting the types instead. Or correct find_or_download_binary to do check the second return. Or "cast" it there with an appropriate comment that f_o_d_b always returns an actual str.

Copy link
Contributor Author

@skshetry skshetry Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or correct find_or_download_binary to do check the second return.

yes, that's what I have done. find_or_download_binary raises if they are falsy values.

def find_or_download_binary(binary_name: str) -> str:
'''
Find the binary in PATH or ~/.lakefs/bin,
or download it if not found
Returns the path to the binary
'''
binary_path = _find_binary(binary_name)
if not binary_path:
_download_binaries()
binary_path = _find_binary(binary_name)
if not binary_path:
raise RuntimeError("binary not found")
return binary_path

@skshetry skshetry force-pushed the fix/lakefs-bin-exec-9460 branch from b764484 to 69c9b01 Compare December 15, 2025 13:09
@skshetry skshetry requested a review from a team December 15, 2025 13:15
@arielshaqed arielshaqed requested review from arielshaqed and removed request for a team December 15, 2025 13:26
return proc.returncode
except subprocess.CalledProcessError as e:
raise RuntimeError(f"Error executing {binary_name}: {e}") from e
proc = subprocess.run([binary_path, *args], check=False)
Copy link
Contributor Author

@skshetry skshetry Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also use os.execvp on posix systems.

I see that ruff does that, which also has a python wrapper executing rust binary.

https://github.com/astral-sh/ruff/blob/d08e41417971f1d05b9daa75f794536a1dd4bedf/python/ruff/__main__.py#L88

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be much much better. It will avoid the whole 0.25s debacle - I was shocked by your link to the code showing that this was hard-coded.1

Footnotes

  1. The code you showed appears to violate these Zen of Python principles:

    • Beautiful is better than ugly.
    • Explicit is better than implicit.
    • Special cases aren't special enough to break the rules.
    • In the face of ambiguity, refuse the temptation to guess.
    • If the implementation is hard to explain, it's a bad idea.

    Breaking 5 out of 19 aphorisms is... impressive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lakefs terminates immediately, it does not wait all 0.25s. A bigger problem is what follows after the _wait(), a SIGKILL, preventing a graceful termination.

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Really solid work. But the 0.25s thing is a deal-breaker. Much safer and simpler just to call sys.exec*. Please do that; sorry.

Comment on lines -158 to -159
if not binary_path:
raise RuntimeError("binary not found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that the typing is wrong! find_or_download_binary may easily return null: it calls _find_binary and fails, then download_binaries does something wrong, and then _find_binary fails again. For instance, if I delete the downloaded lakefs right after you download the binary, then there is no binary to be found.

So please keep this check; consider correcting the types instead. Or correct find_or_download_binary to do check the second return. Or "cast" it there with an appropriate comment that f_o_d_b always returns an actual str.

proc = subprocess.run(
[binary_path] + args, check=False, env=os.environ)
return proc.returncode
except subprocess.CalledProcessError as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

return proc.returncode
except subprocess.CalledProcessError as e:
raise RuntimeError(f"Error executing {binary_name}: {e}") from e
proc = subprocess.run([binary_path, *args], check=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be much much better. It will avoid the whole 0.25s debacle - I was shocked by your link to the code showing that this was hard-coded.1

Footnotes

  1. The code you showed appears to violate these Zen of Python principles:

    • Beautiful is better than ugly.
    • Explicit is better than implicit.
    • Special cases aren't special enough to break the rules.
    • In the face of ambiguity, refuse the temptation to guess.
    • If the implementation is hard to explain, it's a bad idea.

    Breaking 5 out of 19 aphorisms is... impressive.

@skshetry skshetry requested a review from arielshaqed December 16, 2025 05:12
@skshetry skshetry changed the title lakefs: handle KeyboardInterrupt when executing the binary cli: hide traceback on interrupt, use os.execvp on supported systems Dec 16, 2025
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Better and improves typing.

Comment on lines +159 to +164
if sys.platform == 'win32':
try:
proc = subprocess.run([binary_path, *args], check=False)
except KeyboardInterrupt:
sys.exit(1)
sys.exit(proc.returncode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poor Windows users. But since it's unclear how to improve matters, and since the existing experience is certainly more than good enough, I see no reason to add work here.



def cli_run() -> int:
def run(binary_name: Literal['lakefs', 'lakectl'], args: Optional[list[str]] = None) -> NoReturn:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This function will work perfectly well with any binary downloadable from our repo. I see no reason to limit its typing beyond str.

Copy link
Contributor Author

@skshetry skshetry Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, as we only extract lakefs and lakectl.

tar.extract('lakefs', target_dir, filter='data')
tar.extract('lakectl', target_dir, filter='data')

But I get your point. Fixed in 01be1b3.

@skshetry skshetry force-pushed the fix/lakefs-bin-exec-9460 branch from 01f2d28 to 01be1b3 Compare December 16, 2025 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Quickstart: lakefs doesn't exit nicely when run from python env

2 participants