Skip to content

Conversation

@FroggyFlox
Copy link
Member

Fixes #2679
@phillxnet, @Hooverdan96: ready for review

This pull-request (PR) proposes to add a new Service (in System > Services) for Tailscale, allowing the users to connect their Rockstor machine to their existing Tailscale account entirely from Rockstor's UI, without command line interaction. See related issue for rationale.

Overall implementation

Introductory notes

This PR makes the assumption that the tailscale package is already installed on the system, providing the tailscaled daemon systemd service as well as the tailscale cli interface (on which this PR entirely relies). Ensuring the tailscale package is installed should thus be the subject of a dedicated issue in our rockstor-installer repository.

To better understand the implementation in this PR, it is important to note that there are two main components to Tailscale on the system:

  • the tailscaled daemon (systemd service): creates the tun device and "routes" traffic. This service being active is not sufficient for having an active Tailscale connection. This is done by the tailscale binary (see below).
  • the tailscale cli binary: represents the interface between the user and its Tailscale account and connection configuration. This binary requires the tailscaled daemon to up and running to do anything.

@phillxnet, Tailscale licensing is detailed at: https://github.com/tailscale/tailscale/blob/main/LICENSE
Their website (https://tailscale.com/opensource/) details:

The client, which runs on each of a user’s devices, is mostly open source. The core client code for the Tailscale daemon used across all platforms is open source, and the full client code is open source for platforms that are also open source.
We enthusiastically support open source operating systems such as Linux and Android, and we believe that our time and energy is best directed toward the communities for these platforms. As a result, where the operating system is open source, the daemon and GUI are open source; and where the operating system is closed, the daemon is open source and the GUI is closed source.
This means that you can build the Linux and Android clients yourself, and you can build the Windows and macOS clients without the GUI.

Rockstor's implementation

  1. The Tailscale service is configurable. The user can thus configure a various set of parameters using Rockstor's UI. By default, it is NOT configured, and the tailscaled service is NOT running.
  2. When configured, we assume the user intends on using the Tailscale service and thus start the tailscaled daemon. This creates the tun device but does not activate the Tailscale connection. This step is a requirement for the initiation of the login/authentication mechanism.
  3. In this state (configured, but Service OFF), we detect if the user needs to authenticate this machine to their Tailscale account. If it is the case, we display a "Login" button in lieu of the usual ON/OFF toggle.
  4. When clicking the "Login" button, we trigger and shortly thereafter timeout a tailscale up call; this generates an authURL that we then retrieve and use to open a new browser window that the user can thus use to complete the process of authentication to their Tailscale account. When completed, the Tailscale service is ON and the machine is fully connected to their Tailscale network.
  5. Toggling the Tailscale service OFF disonnects the machine from their Tailscale network. It does not turn OFF the tailscaled daemon as this PR assumes the user might still want to turn it back ON (shortly) thereafter. I'm thinking of using Ability to reset services to original status/settings #2332 here: when resetting the service to its default, we consider that the user does not want Tailscale anymore and we would thus stop the tailscaled daemon then.

Implementation details

Tailscale service configuration

As detailed in #2679 (comment), tailscale supports many options, and this PR proposes to provide a simple form to set what might be the most commonly used ones.
For all other options, a custom configuration box is offered so that the user can manually enter what they want. Note that if the user enters in this custom configuration box a parameter that is already supported in the form above it, this PR considers what was defined in the form as taking precedence over the custom configuration and thus ignores it.
The configuration form includes a variety of Javascript-based validation to prevent the user to enter incompatible or invalid settings. Note that this covers the incompatibilities that I have personally witnessed... there may be more that I have not seen yet.

Just like all other services, submitting the configuration form sends that data to the new TailscaledView that parses it and validates it as follows:

  • if hostname param is present: verify it is alphanumeric and does not include special characters. Underscores are also invalid so we change them to hyphens (same as what the Tailscale webUI does).
  • parses the custom configuration and remove parameter if it is a duplicate of what's already in the webUI form.
    After this validation, we save the config in the database.

This also enables and starts the tailscaled systemd service, required for the next steps.

See below for a screenshot of the configuration modal:
image

Tailscale service status

Unlike other Services, Tailscale needs to be authenticated to function. We thus consider the Tailscale service as ON if the machine is connected to the user's Tailscale network. This, however, requires authentication, explained in the next section.

For Tailscale, however, we need to have a more granular knowledge of Tailscale status than for other services (due to the reasons above). Fortunately, tailscale includes a subcommand (status) that can report a very detailed report in JSON. This includes one important key: BackendState. This key indeed is enough to give us the information we need:

  • NeedsLogin
  • Stopped
  • Running

We thus have 3 states. On top of that, we also have the state of "tailscaled systemd service not active" -> 4 states. In reality, however, we just need to know the following:

  • is the tailscaled daemon running?
  • if the tailscaled daemon is running:
    • does tailscale need to be authenticated (BackendState: "NeedsLogin")? If yes: show "Login" button (see below).
    • is tailscale running?
    • is tailscale stopped?

In services.js, we currently use return codes as returned by systemd/supervisorctl to represent services status to display the ON/OFF toggle accordingly:

  • for return code 0: running
  • for return code > 0: not running

Note that systemd already returns: 0 = unit is active, 3 = unit is not active, 4 = no such unit (@phillxnet's notes).
Here, we need a new state: needs login. I thus chose to attribute a return code of 5 to represent that. This is detailed in system.services.tailscale_service_status(). As a result, if service status is 5, we need to display the "Login" button.

Tailscale authentication/login

Running tailscale up at this point would hang, waiting for the user to visit an automatically generated URL to login/authenticate to their Tailscale account. As a result, this would make the front-end hang as well. I have thus ended up relying on the following "feature": when tailscale up is triggered, it checks if authentication is needed and generates an authURL, waiting for the user to copy this URL, paste it in the browser and then complete the authentication process. When completed, tailscale up detects that and finishes activating the tailscale connection. Now, the "detail" that this PR uses is that when that initial tailscale up call that generates the needed authURL finishes (successfully or not), the generated authURL can be retrieved from tailscale status, and is still valid for the user to complete the authentication process. Rockstor thus uses that as follows:

  • when authentication is required, we display a "Login" button in lieu of the ON/OFF toggle (design choice to be discussed if needed).
  • clicking this button calls a new endpoint with the login action (new definition in urls) that calls tailscale up with a timeout of 2 sec. That is enough to generate the authURL, and then completes. We can then call tailscale status to retrieve this authURL and save it in the config model entry. We then refresh the Backbone collection for the services to have that authURL available to our front-end and use that to open a new window with that URL. That process is contained in the tailscaleAuthAction function in services.js.
  • once the authentication process is completed by the user, the tailscale service is automatically turned on (done by tailscale itself, not Rockstor).
  • the latter is picked up by our automatic websocket-based service status update mechanism (polling interval: 15 sec) that will then show the Tailscale service as ON. This explains why Don't throw exception when getting supervisord service status #2681 #2682 was needed here.

Machine expiration

By default, each machine newly connected to a Tailscale network expires in 30 days.
I've tried to mimick this as follows:

  1. "remove" the machine from Tailscale's dashboard. Unfortunately, this does not stop the tailscale connection on the machine as it still shows a status of "Running".
  2. in Rockstor, manually toggle the Tailscale service OFF. tailscale status shows it as Stopped, but not needing login, unfortunately, so we have the ON/OFF toggle still in Rockstor's UI.
  3. toggling the service ON, though, triggers tailscale up, which then detects it needs login. In Rockstor UI, this is detected and we show the "Login" button. One can then complete the login process.

It is not as smooth as I would like, but I'm unsure how to deal with this "weird" state given the Needs login state is not detected by Tailscale before we need to know it. Maybe a "normal" machine expiration is reflected differently than manually removing it from the dashboard?

The tailscale up command does have a --force-reauth flag that will force a re-authentication. Maybe this should be included in the configuration modal as a button in case of need.

Testing

All development and testing was conducted on Leap 15.5; all tests are passing (note that we now have 17 more tests included in this PR):

buildvm155:/opt/rockstor/src/rockstor # poetry run django-admin test
Creating test database for alias 'default'...
Creating test database for alias 'smart_manager'...
System check identified no issues (0 silenced).
.......................................................................................................................................................................................................................................................................................
----------------------------------------------------------------------
Ran 279 tests in 15.877s

OK

Miscellaneous

Services table

I had to make a few re-arrangements in the table listing all services:

  • increase default display of services to 30.
  • remove the text-align: left css that was forcing a left alignment for the buttons.
  • reduce width of toggle/login buttons
    The resulting table does have a lot of empty space in the center, though... This is another design choice to be discussed as/if desired.

Change to run_command()

**This PR does include a change to system.osi.run_command()**. This change was required to easily load the JSON output from tailscale status --jsonas a dict. I've slightly editedrun_command()to allow returning the "raw" output fromp.communicate()`, while leaving the default behavior the same as before.

Help button link

The link included in the help button at the top of the configuration modal points to the docs' root. This should be changed to point to a dedicated description of the Tailscale service once it exists.

Change to .gitignore

We previously had static/ in .gitignore to ignore the static dir created by collectstatic. This, however, was not selective enough and was causing static/ subfolders to be ignored. That included configure_tailscaled.jst, for instance. The change in this PR should concern only the static/ folder located in the project's root: /static/.

How the new tailscale0 tun device shows in the Network page

tailscaled systemd service is inactive:
image

tailscaled systemd service is active, but tailscale is OFF:
image

Tailscale service ON:
image

Add support for Tailscale as a Service that can configured from
Rockstor's webUI. This commit assumes that the tailscale package is
already installed on the system.

- add "Tailscale: tailscaled" as a new service in the `prep_db.py`
  script with no default configuration (null).
- add new urls endpoints with the same commands as other services:
  config, start, stop.
- unlike other services, add 2 new endpoints ("children" of "config")
  for the login/logout actions.
- add a new TailscaledServiceView to handle these endpoints.
- add logic to support services needing login/authentication.
- add tailscale utils functions to interface with the tailscale cli
  binary.
- add flag to `system.osi.run_command()` to return raw output. This
  commit does NOT change the default. Add relevant unit testing.
- increase default number of services shown in the table to 30.
- change services table column width and center buttons.
- ignore only the static/ folder located in root dir
- add unit testing to `system.services`: only partial coverage, though.
@FroggyFlox FroggyFlox added the needs review Ideally by prior rockstor-core contributor label Oct 18, 2023
@FroggyFlox FroggyFlox linked an issue Oct 18, 2023 that may be closed by this pull request
should return
"hostname": "rock-dev-123"
"""
if "hostname" in config:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick question, can a hostname not have unicode characters in it? If that's the case, wouldn't some "inclusive" regex be better than excluding what we currently know as "special" characters"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that is one very good question here. It's true that this validation is rather ugly at the moment...
To answer your question on unicode characters: I've tested using Tailscale's UI, which allows you to rename a hostname. This is a good way to test what is valid or not. I've pasted the following into the box: München, and it autocorrected to a hyphen:
image

It thus seems we can't use anything besides alphanumeric and -. I'll change to using a regex filtering/correction as you recommended.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FroggyFlox @Hooverdan96 Re:

I'll change to using a regex filtering/correction as you recommended.

Given this is a first draft, and a large one at that, I'm keen to get this out as-is. So this could be an enhancement by way of a follow-up. It likely won't be the last with such a significant addition.

Building now from what we have currently with a view to getting this out to the testing channel. We are getting along in this testing phase but not at the end yet so still time for some additions/refinements before we start heading for the next stable.

@Hooverdan96
Copy link
Member

@FroggyFlox - this is amazing!!! Looks simple enough :). I had a question on the hostname validation that I placed as a comment in the pull request.

We will definitely need the documentation for that in the Rockstor docs.

Copy link
Member

@phillxnet phillxnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FroggyFlox This is a superb effort/addition, I'm keen to get this out into testing for proper field tests as this is not a significant knowledge area for me.
@Hooverdan96's point was keen and we can follow that up as we progress in the current testing channel if need be.

Your additional test coverage here is most welcome. And thanks for the excellent tests on the new code presented in this PR.

Noting the following single test failure when Tailscale is not installed on the rpm build host:

======================================================================
FAIL: test_disable_ip_forwarding (rockstor.system.tests.test_system_network.SystemNetworkTests)
test proper call of os.remove() and final sysctl command
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/worker/Poetry-Build-on-Leap15-5/rpmbuild/rockstor-core-5.0.4-2712/src/rockstor/system/tests/test_system_network.py", line 795, in test_disable_ip_forwarding
    self.mock_os_remove.assert_called_once_with("/etc/sysctl.d/99-tailscale.conf")
  File "/usr/lib64/python3.9/unittest/mock.py", line 918, in assert_called_once_with
    raise AssertionError(msg)
AssertionError: Expected 'remove' to be called once. Called 0 times.
----------------------------------------------------------------------
Ran 279 tests in 26.910s
FAILED (failures=1)

We may want to assess if this is acceptable or even avoidable?

After installing the repo and package (Leap 15.5 only currently) via the instructions in our rockstor-installer issue comment: rockstor/rockstor-installer#142 (comment)

We have all tests pass as per PR text.

Thanks for thinking to extend our services page: it would have been a shame for our ongoing page refresh bug to have returned us to page one and lost this entry.

If one submits the default config (accessed via spanner or auto shown via enable switch):
Tailscale-config-rpm-build

There is a:

Unknown internal error doing a POST to /api/sm/services/tailscaled/config

/opt/rockstor/var/log/rockstor.log

system.exceptions.CommandException: Error running a command. cmd = /usr/bin/systemctl stop tailscaled. rc = 5. stdout = ['']. stderr = ['Failed to stop tailscaled.service: Unit tailscaled.service not loaded.', '']

Am I doing it wrong currently ?
I have installed the package via my indicated comment above. But that is all. Trying to simulate folks doing the same to see how we cope with that situation.

@phillxnet
Copy link
Member

So the full rockstor.log entry for what is likely an incomplete install of Tailscale on my part, is:

[19/Oct/2023 13:46:46] ERROR [system.osi:261] non-zero code(5) returned by command: ['/usr/bin/systemctl', 'stop', 'tailscaled']. output: [''] error: ['Failed to stop tailscaled.service: Unit tailscaled.service not loaded.', '']
[19/Oct/2023 13:46:46] ERROR [storageadmin.middleware:34] Exception occurred while processing a request. Path: /api/sm/services/tailscaled/config method: POST
[19/Oct/2023 13:46:46] ERROR [storageadmin.middleware:35] Error running a command. cmd = /usr/bin/systemctl stop tailscaled. rc = 5. stdout = ['']. stderr = ['Failed to stop tailscaled.service: Unit tailscaled.service not loaded.', '']
Traceback (most recent call last):
  File "/opt/rockstor/.venv/lib/python3.9/site-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/rockstor/.venv/lib/python3.9/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/opt/rockstor/.venv/lib/python3.9/site-packages/django/views/generic/base.py", line 71, in view
    return self.dispatch(request, *args, **kwargs)
  File "/opt/rockstor/.venv/lib/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
  File "/opt/rockstor/.venv/lib/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/opt/rockstor/.venv/lib/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/opt/rockstor/.venv/lib/python3.9/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/usr/lib64/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/opt/rockstor/src/rockstor/smart_manager/views/tailscaled_service.py", line 98, in post
    systemctl(self.name, "stop")
  File "/opt/rockstor/src/rockstor/system/services.py", line 107, in systemctl
    return run_command(arg_list, log=True)
  File "/opt/rockstor/src/rockstor/system/osi.py", line 263, in run_command
    raise CommandException(cmd, out, err, rc)
system.exceptions.CommandException: Error running a command. cmd = /usr/bin/systemctl stop tailscaled. rc = 5. stdout = ['']. stderr = ['Failed to stop tailscaled.service: Unit tailscaled.service not loaded.', '']

I'll review my referenced Tailscale install comment.

@phillxnet
Copy link
Member

OK, some progress: In my test setup I have the following:

rleap15-5:~ # systemctl status tailscaled.service
Unit tailscaled.service could not be found.

So a partial install where the service has not been registered but installed !!

@phillxnet
Copy link
Member

OK, likely my fault here, I had assumed the failed test, which passed later on, was due to the package actually having been installed: but alas I had only added the repo !!! Strange how the indicated test failed initially (two builds, but later passed). Something likely interesting there.
I'll continue as currently build system has:

rleap15-5:~ # zypper info tailscale
...
Information for package tailscale:
----------------------------------
Repository     : Tailscale stable
Name           : tailscale
Version        : 1.50.1-1
Arch           : x86_64
Vendor         : 
Installed Size : 43.9 MiB
Installed      : No
Status         : not installed
Source package : tailscale-1.50.1-1.src
Upstream URL   : https://www.tailscale.com
Summary        : The easiest, most secure, cross platform way to use WireGuard + oauth2 + 2FA/SSO
Description    : 
    The easiest, most secure, cross platform way to use WireGuard + oauth2 + 2FA/SSO

So thus far: user error on my part.

Copy link
Member

@phillxnet phillxnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so bar my "doing it wrong" and discovering an anomaly re one of the tests; once tailscale is actually installed, visiting the config and submitting (all defaults) I get the following "Login" appear:
Tailscale-service-login

rleap15-5:~ # systemctl status tailscaled.service
● tailscaled.service - Tailscale node agent
     Loaded: loaded (/lib/systemd/system/tailscaled.service; enabled; vendor preset: disabled)
     Active: active (running) since Thu 2023-10-19 14:10:29 WEST; 6min ago
       Docs: https://tailscale.com/kb/
    Process: 16372 ExecStartPre=/usr/sbin/tailscaled --cleanup (code=exited, status=0/SUCCESS)
   Main PID: 16415 (tailscaled)
     Status: "Stopped; run 'tailscale up' to log in"
      Tasks: 10 (limit: 2334)
     CGroup: /system.slice/tailscaled.service
             └─ 16415 /usr/sbin/tailscaled --state=/var/lib/tailscale/tailscaled.state --socket=/run/tailscale/tailscaled.sock --port=41641

Oct 19 14:10:30 rleap15-5 tailscaled[16415]: Backend: logs: be:3c910da634826d267eded683e93b9dbe247ea758eb2c0207468b5298fd7a0244 fe:
Oct 19 14:10:30 rleap15-5 tailscaled[16415]: Switching ipn state NoState -> NeedsLogin (WantRunning=false, nm=false)
Oct 19 14:10:30 rleap15-5 tailscaled[16415]: blockEngineUpdates(true)
Oct 19 14:10:30 rleap15-5 tailscaled[16415]: health("overall"): error: state=NeedsLogin, wantRunning=false
Oct 19 14:10:30 rleap15-5 tailscaled[16415]: wgengine: Reconfig: configuring userspace WireGuard config (with 0/0 peers)
Oct 19 14:10:30 rleap15-5 tailscaled[16415]: wgengine: Reconfig: configuring router
Oct 19 14:10:30 rleap15-5 tailscaled[16415]: wgengine: Reconfig: configuring DNS
Oct 19 14:10:30 rleap15-5 tailscaled[16415]: dns: Set: {DefaultResolvers:[] Routes:{} SearchDomains:[] Hosts:0}
Oct 19 14:10:30 rleap15-5 tailscaled[16415]: dns: Resolvercfg: {Routes:{} Hosts:0 LocalDomains:[]}
Oct 19 14:10:30 rleap15-5 tailscaled[16415]: dns: OScfg: {}
rleap15-5:~ # echo $?
0

After using the "Login" which opened a new tab to initialte a Tailscale sign-up/login with machine registration attached. Once logged in we get:
Login-to-Tailscale-auth-rockstor-instance

I was informed/provided-with a link in the same session for admin to accept this new machine. This was done successfully. The Tailscale 'Login' in Rockstor services page had then changed to the on/off switch in On position.

Network is as per PR text:
network-connections-tailscale0-connected

And sharing this machine via an invite link with @FroggyFlox provided the capability to ping from their tailnet. This is with the default configuration.

As my working knowledge of Tailscale is somewhat limited I'd like to get this out into our testing channel so folks more knowledgable there an test what we have to-date.

And issue can be created for @Hooverdan96 key point re inclusive regex improvement on hostname validation and likely some enhancements to cope more elegantly with no Tailscale installed situation.

@phillxnet
Copy link
Member

In addition @FroggyFlox has now tested Rockstor Web-UI access to the build machine used in this end-to-end RPM build/install test. So that's all looking quite promising. I'll go ahead and merge this so we can get to the next testing release. Some follow-ups likely but we can issue/pr them in turn.
To date:

  • @Hooverdan96 note re hostname validation improvements.
  • @FroggyFlox re blocking our native network edit for the tailscale interface. We likely want to replace this with a link to the servcie 'spanner' to centralise these settings.

N.B. only the default configuration has been tested by myself.

@phillxnet
Copy link
Member

Post a reboot of the test system, and before Web-UI login, tailscale status indicated all was as expected, and the admin console at Tailscale indicated similarly that the newly added machine was "Connected", with endpoints accurately reflecting the public and private IP addresses associated with this test machine.
I think for the basic function we are good here. More testing from the testing channel rpm release hopefully; once that is rolled and released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Tailscale integration

3 participants