Skip to content

Separate content from file identity in lint()#3017

Open
atusy wants to merge 6 commits into
r-lib:mainfrom
atusy:feature/lint-text-with-identity
Open

Separate content from file identity in lint()#3017
atusy wants to merge 6 commits into
r-lib:mainfrom
atusy:feature/lint-text-with-identity

Conversation

@atusy

@atusy atusy commented Feb 26, 2026

Copy link
Copy Markdown

Motivation

When IDEs like those using REditorSupport/languageserver lint unsaved buffers, they need to pass buffer content but resolve config, exclusions, knitr detection, and output against the real file path. Currently lint() conflates these two concerns: filename is both "where to read" and "who am I." This means users must save before getting up-to-date lints.

This PR gives meaning to lint(filename, text=) used together, enabling IDEs to lint unsaved buffer content while preserving full file identity.

Three modes

Mode Call Behavior
File (unchanged) lint("R/foo.R") Read + identity from file
Anonymous text (unchanged) lint(text = "x=1") Tempfile, output shows <text>
Text with identity (NEW) lint("R/foo.R", text = buf) Content from text, identity from filename

In Mode 3, filename does NOT need to exist on disk. Output shows the real path.

Example

The key difference: lint output is attributed to the real file path, enabling IDEs to map diagnostics back to the correct buffer.

# Mode 3: lints are attributed to the identity path
lint("R/foo.R", text = "x = 1\n", linters = assignment_linter())
#> R/foo.R:1:3: style: [assignment_linter] Use <-, not =, for assignment.

# Anonymous text: output shows <text>
lint(text = "x = 1\n", linters = assignment_linter())
#> <text>:1:3: style: [assignment_linter] Use <-, not =, for assignment.

Beyond the output path, the identity filename also drives:

  • Config discovery (parse_settings = TRUE): .lintr is found relative to the identity path
  • Exclusions: # nolint comments and config-based file exclusions work against the identity path
  • Knitr detection: file extension determines chunk extraction (.Rmd vs .R)
  • Caching: cache is keyed by content, indexed by identity path

Note: parse_settings defaults to FALSE for text-based linting to avoid breaking changes. Pass TRUE explicitly for Mode 3 if config discovery is needed.

Changes

  • R/settings_utils.R: find_package() uses mustWork = FALSE to tolerate non-existent paths
  • R/exclude.R: exclude() normalizes source exclusions (from # nolint) separately from config exclusions, avoiding Sys.glob/file.exists on identity paths
  • R/lint.R: Adds has_filename flag, normalize_identity_path() helper to expand relative paths for non-existent files, passes lint_obj to cache_file() (matching retrieve_file())
  • R/expect_lint.R: When both file and content are provided, uses Mode 3 (lint(file, text = content))

Test plan

  • All existing tests pass (test-lint, test-exclusions, test-cache, test-settings, test-expect_lint)
  • New tests for Mode 3: identity path in output, non-existent files, nolint comments, config discovery, caching, knitr/Rmd detection
  • New tests for find_package() with non-existent paths
  • New tests for exclude() with non-existent file paths
  • New tests for expect_lint() with file + content

@atusy atusy marked this pull request as draft February 26, 2026 12:31
@MichaelChirico

Copy link
Copy Markdown
Collaborator

Maybe this is a good reason to finally add a dedicated lint_text() (and start deprecating lint(text=...)), WDYT?

@atusy

atusy commented Feb 26, 2026

Copy link
Copy Markdown
Author

That is a good idea! Do you want me to include it in this PR?

@atusy atusy force-pushed the feature/lint-text-with-identity branch from 6d1bb83 to ab8a81f Compare February 28, 2026 23:22
@codecov

codecov Bot commented Feb 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.29%. Comparing base (c3d1dc5) to head (dfe37ad).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3017      +/-   ##
==========================================
+ Coverage   99.25%   99.29%   +0.04%     
==========================================
  Files         128      128              
  Lines        7355     7367      +12     
==========================================
+ Hits         7300     7315      +15     
+ Misses         55       52       -3     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@atusy atusy marked this pull request as ready for review March 1, 2026 11:32
@atusy atusy marked this pull request as draft March 1, 2026 11:34
@atusy atusy marked this pull request as ready for review March 1, 2026 11:38
@atusy

atusy commented Mar 1, 2026

Copy link
Copy Markdown
Author

@MichaelChirico
For now, I just modified lint().
We can add lint_text() in a different PR.

@atusy atusy force-pushed the feature/lint-text-with-identity branch from 388c8c2 to f6dbf7c Compare March 2, 2026 23:56
atusy added 5 commits March 6, 2026 00:46
Change mustWork = !allow_rproj to mustWork = FALSE so that
find_package() returns NULL instead of erroring when given a
non-existent file path. This is needed for the upcoming
lint(filename, text=) mode where filename serves as identity
but may not exist on disk.

All callers already handle NULL return gracefully.
Split normalization of source exclusions (from # nolint comments)
and config exclusions (from .lintr file) so that source exclusions
use normalize_path=FALSE. This avoids Sys.glob/file.exists calls
that would fail for non-existent identity paths in the upcoming
lint(filename, text=) mode.
lint(filename, text=buf) now takes content from text while using
filename as file identity for config discovery, knitr detection,
exclusions, and output. The file need not exist on disk.

Key changes:
- New normalize_identity_path() helper expands relative paths for
  non-existent identity files
- cache_file() now receives lint_obj instead of filename, matching
  retrieve_file() which already uses lint_obj
- Documentation updated for the new usage mode
When both file and content are provided as character, expect_lint()
now uses Mode 3 (lint(file, text=content)) instead of writing
content to a tempfile. This lets tests specify an identity path
for config discovery and knitr detection while providing content
inline.

Extract check_lints() helper from expect_lint() to keep both
functions within the cyclocomp limit.

The is.character(content) guard prevents accidental Mode 3
activation when a linter list lands in the content parameter
via R's positional argument matching.
@MichaelChirico

Copy link
Copy Markdown
Collaborator

@atusy for book-keeping, was an LLM used in whole or in part here?

)
})

# Mode 3: lint(filename, text=) — content from text, identity from filename

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-ASCII , leftover from CJK mode?

@MichaelChirico

Copy link
Copy Markdown
Collaborator

I deployed Gemini to run a joint review of this and #3034 which I understand to be attempting to solve the same issue working with {lintr} in vscode.

@atusy what do you make of this caution about filename=""? I think it's not particularly relevant and can be ignored for now.


Executive Summary & Recommendation

We recommend accepting the approach of PR 3017 with improvements, and rejecting PR 3034.

  • PR 3034 is incomplete. It only prevents the crash in find_package(), but fails later in exclude() when lints are found, leading to a different crash. The regression test added in PR 3034 actually fails.
  • PR 3017 is a complete and robust solution that introduces a clean separation between file content and file identity. It allows full feature support (config discovery, exclusions, knitr detection) for unsaved buffers.
  • Proposed Improvement to PR 3017: PR 3017 resolves empty filename "" to the directory getwd(), which causes lints to be attributed to the directory itself in the output. We propose mapping "" to "untitled" internally in lint(). This resolves to a file path (e.g., /path/to/project/untitled) which is cleaner and safer for IDE mapping.

Detailed Analysis of PR 3034

Approach

PR 3034 adds a guard at the beginning of find_package() to return NULL if the path is empty:

find_package <- function(path, allow_rproj = FALSE, max_depth = 2L) {
+  if (!nzchar(path)) {
+    return(NULL)
+  }
   path <- normalize_path(path, mustWork = !allow_rproj)

Assessment

  • Correctness (Low): It is incomplete. If the linting process actually finds any lints, lint() calls exclude() at the end. In exclude(), the empty filename "" is passed to normalize_exclusions(), which treats it as an unnamed exclusion and fails with:
    Error in `normalize_exclusions()`:
    ℹ Full file exclusions must be <character> vectors of length 1.
    ✖ Items at the following indices are not: 1.
    
  • Robustness (Low): The regression test added in PR 3034:
    test_that("object_length_linter does not crash with empty filename", {
      expect_length(
        lint("", text = "x = 1", linters = assignment_linter(), parse_settings = FALSE),
        1L
      )
    })
    fails on this branch due to the normalize_exclusions error.
  • Maintenance Burden (Low): Very small change, but doesn't solve the problem.

Detailed Analysis of PR 3017

Approach

PR 3017 refactors lint() to support "Mode 3": lint(filename, text = content) where filename is the identity of the file (which may not exist on disk) and text is the content to lint.
Key changes:

  1. find_package() tolerance: Changed to use mustWork = FALSE in normalize_path so it returns NULL instead of erroring for non-existent paths.
  2. normalize_identity_path(): Introduced in lint() to resolve relative paths of non-existent files by prepending getwd() before normalizing.
  3. Exclusions Refactoring: Splits normalization of source exclusions (from # nolint comments in the text) and config exclusions (from .lintr). Source exclusions are normalized with normalize_path = FALSE to prevent them from being discarded because the identity file doesn't exist on disk.
  4. Caching Fix: Fixes a bug where inline data caching used inconsistent keys between saving and retrieving.

Assessment

  • Correctness (High): Fully addresses the crash and ensures all features (caching, exclusions, config discovery) work correctly for non-existent files.
  • Suitability (High): This is the correct design for IDE integration, as it allows linting unsaved buffers while respecting the package context they belong to.
  • Robustness (High): Handles relative/absolute paths and non-existent files correctly. All tests pass, including comprehensive new tests for the new mode.
  • Maintenance Burden (Medium): The changes in exclude.R and lint.R increase complexity, but it is necessary for correctness.
  • Efficiency (Good): Avoids creating temporary files when text is provided with a filename.

Edge Case: Empty Filename ""

In PR 3017, if filename is "", it is resolved to getwd() (absolute path of current directory).

  • Result: Lints are attributed to the directory itself in the output:
    /path/to/project:1:3: style: [assignment_linter] Use <- for assignment, not =.
    x = 1
      ^
    
  • Impact: This is confusing for command-line users and might break IDE mapping (IDE might not be able to map diagnostics to the unsaved buffer if the filename is a directory).

Proposed Path Forward

We should adopt PR 3017 but apply the following modifications:

  1. Map "" to "untitled" in lint():
    At the beginning of lint(), map empty filename to a dummy filename like "untitled":

    if (!missing(filename) && !nzchar(filename)) {
      filename <- "untitled"
    }

    This ensures that:

    • The filename is resolved to /path/to/project/untitled (a file, not a directory).
    • The output shows untitled (or absolute path to it), which is cleaner.
    • All other PR 3017 logic (config discovery starting from getwd(), exclusions, etc.) still works.
    • We tested this modification, and it successfully resolves the directory output issue while passing all tests.
  2. Keep the find_package guard (Optional but recommended):
    Add the !nzchar guard from PR 3034 to find_package as defensive programming, in case it is called with "" from other entry points.

Code Diff for Proposed Improvement (relative to PR 3017)

diff --git a/R/lint.R b/R/lint.R
index b4ef2251..df719036 100644
--- a/R/lint.R
+++ b/R/lint.R
@@ -47,6 +47,9 @@ lint <- function(filename, linters = NULL, ..., cache = FALSE,
   dot_names <- if (getRversion() %in% c("4.1.1", "4.1.2")) names(list(...)) else ...names()
   check_dots(dot_names, c("exclude", "parse_exclusions"))
 
+  if (!missing(filename) && !nzchar(filename)) {
+    filename <- "untitled"
+  }
   needs_tempfile <- missing(filename) || re_matches(filename, rex(newline))
   inline_data <- !is.null(text) || needs_tempfile
 
diff --git b/R/settings_utils.R b/R/settings_utils.R
index d9327e79..262071f1 100644
--- a/R/settings_utils.R
+++ b/R/settings_utils.R
@@ -8,6 +8,9 @@ has_rproj <- function(path) {
 }
 
 find_package <- function(path, allow_rproj = FALSE, max_depth = 2L) {
+  if (!nzchar(path)) {
+    return(NULL)
+  }
   path <- normalize_path(path, mustWork = FALSE)
   if (allow_rproj) {
     found <- function(path) has_description(path) || has_rproj(path)

Comment thread man/lint.Rd
The latter (inline data) applies whenever \code{filename} has a newline character (\\n).}
The latter (inline data) applies whenever \code{filename} has a newline character (\\n).
When combined with \code{text}, \code{filename} serves as the file identity for config discovery, exclusions,
knitr detection, and output — the file need not exist on disk.}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-ASCII here too

Comment thread R/expect_lint.R
if (is.null(file)) on.exit(unlink(file), add = TRUE)
file <- maybe_write_content(file, content) # NB: the lint consistency fuzz suite anchors here.
if (!is.null(file) && !missing(content) && is.character(content)) {
# Both file and content → Mode 3 (identity-aware linting)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-ASCII

Comment thread R/exclude.R

exclusions <- normalize_exclusions(c(source_exclusions, exclusions))
# Source exclusions use already-normalized paths from source_expression$filename.
# normalize_path=FALSE avoids Sys.glob/file.exists which fail for non-existent files (Mode 3).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somewhat odd to refer to Mode 3 in this file, it's a pretty "internal" terminology. I think we can just omit the parenthetical.

Comment thread R/exclude.R
source_exclusions <- normalize_exclusions(source_exclusions, normalize_path = FALSE)
config_exclusions <- normalize_exclusions(exclusions)
exclusions <- c(source_exclusions, config_exclusions) |>
remove_file_duplicates() |>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following these remove_*_duplicates?

Comment thread R/expect_lint.R

if (is.null(file)) on.exit(unlink(file), add = TRUE)
file <- maybe_write_content(file, content) # NB: the lint consistency fuzz suite anchors here.
if (!is.null(file) && !missing(content) && is.character(content)) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to support this in expect_lint()? Wouldn't it be better to keep expect_lint() simpler?

Comment thread R/lint.R
class(lints) <- c("lints", "list")

cache_file(lint_cache, filename, linters, lints)
cache_file(lint_cache, lint_obj, linters, lints)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is puzzling too -- the helper is cache_file() and the argument is filename, now we're passing lint_obj. I think it's fine since this is all just digest() adornment, but maybe we should rename some internals for clarity?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants