Separate content from file identity in lint() by atusy · Pull Request #3017 · r-lib/lintr

atusy · 2026-02-26T12:09:14Z

Motivation

When IDEs like those using REditorSupport/languageserver lint unsaved buffers, they need to pass buffer content but resolve config, exclusions, knitr detection, and output against the real file path. Currently lint() conflates these two concerns: filename is both "where to read" and "who am I." This means users must save before getting up-to-date lints.

This PR gives meaning to lint(filename, text=) used together, enabling IDEs to lint unsaved buffer content while preserving full file identity.

Three modes

Mode	Call	Behavior
File (unchanged)	`lint("R/foo.R")`	Read + identity from file
Anonymous text (unchanged)	`lint(text = "x=1")`	Tempfile, output shows `<text>`
Text with identity (NEW)	`lint("R/foo.R", text = buf)`	Content from `text`, identity from `filename`

In Mode 3, filename does NOT need to exist on disk. Output shows the real path.

Example

The key difference: lint output is attributed to the real file path, enabling IDEs to map diagnostics back to the correct buffer.

# Mode 3: lints are attributed to the identity path
lint("R/foo.R", text = "x = 1\n", linters = assignment_linter())
#> R/foo.R:1:3: style: [assignment_linter] Use <-, not =, for assignment.

# Anonymous text: output shows <text>
lint(text = "x = 1\n", linters = assignment_linter())
#> <text>:1:3: style: [assignment_linter] Use <-, not =, for assignment.

Beyond the output path, the identity filename also drives:

Config discovery (parse_settings = TRUE): .lintr is found relative to the identity path
Exclusions: # nolint comments and config-based file exclusions work against the identity path
Knitr detection: file extension determines chunk extraction (.Rmd vs .R)
Caching: cache is keyed by content, indexed by identity path

Note: parse_settings defaults to FALSE for text-based linting to avoid breaking changes. Pass TRUE explicitly for Mode 3 if config discovery is needed.

Changes

R/settings_utils.R: find_package() uses mustWork = FALSE to tolerate non-existent paths
R/exclude.R: exclude() normalizes source exclusions (from # nolint) separately from config exclusions, avoiding Sys.glob/file.exists on identity paths
R/lint.R: Adds has_filename flag, normalize_identity_path() helper to expand relative paths for non-existent files, passes lint_obj to cache_file() (matching retrieve_file())
R/expect_lint.R: When both file and content are provided, uses Mode 3 (lint(file, text = content))

Test plan

All existing tests pass (test-lint, test-exclusions, test-cache, test-settings, test-expect_lint)
New tests for Mode 3: identity path in output, non-existent files, nolint comments, config discovery, caching, knitr/Rmd detection
New tests for find_package() with non-existent paths
New tests for exclude() with non-existent file paths
New tests for expect_lint() with file + content

MichaelChirico · 2026-02-26T16:02:02Z

Maybe this is a good reason to finally add a dedicated lint_text() (and start deprecating lint(text=...)), WDYT?

atusy · 2026-02-26T22:18:33Z

That is a good idea! Do you want me to include it in this PR?

codecov · 2026-02-28T23:26:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.29%. Comparing base (c3d1dc5) to head (dfe37ad).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3017      +/-   ##
==========================================
+ Coverage   99.25%   99.29%   +0.04%     
==========================================
  Files         128      128              
  Lines        7355     7367      +12     
==========================================
+ Hits         7300     7315      +15     
+ Misses         55       52       -3

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

atusy · 2026-03-01T11:39:32Z

@MichaelChirico
For now, I just modified lint().
We can add lint_text() in a different PR.

Change mustWork = !allow_rproj to mustWork = FALSE so that find_package() returns NULL instead of erroring when given a non-existent file path. This is needed for the upcoming lint(filename, text=) mode where filename serves as identity but may not exist on disk. All callers already handle NULL return gracefully.

Split normalization of source exclusions (from # nolint comments) and config exclusions (from .lintr file) so that source exclusions use normalize_path=FALSE. This avoids Sys.glob/file.exists calls that would fail for non-existent identity paths in the upcoming lint(filename, text=) mode.

lint(filename, text=buf) now takes content from text while using filename as file identity for config discovery, knitr detection, exclusions, and output. The file need not exist on disk. Key changes: - New normalize_identity_path() helper expands relative paths for non-existent identity files - cache_file() now receives lint_obj instead of filename, matching retrieve_file() which already uses lint_obj - Documentation updated for the new usage mode

When both file and content are provided as character, expect_lint() now uses Mode 3 (lint(file, text=content)) instead of writing content to a tempfile. This lets tests specify an identity path for config discovery and knitr detection while providing content inline. Extract check_lints() helper from expect_lint() to keep both functions within the cyclocomp limit. The is.character(content) guard prevents accidental Mode 3 activation when a linter list lands in the content parameter via R's positional argument matching.

MichaelChirico · 2026-06-04T17:54:06Z

@atusy for book-keeping, was an LLM used in whole or in part here?

MichaelChirico · 2026-06-04T17:54:43Z

  )
 })
+
+# Mode 3: lint(filename, text=) — content from text, identity from filename


non-ASCII —, leftover from CJK mode?

MichaelChirico · 2026-06-04T18:41:14Z

I deployed Gemini to run a joint review of this and #3034 which I understand to be attempting to solve the same issue working with {lintr} in vscode.

@atusy what do you make of this caution about filename=""? I think it's not particularly relevant and can be ignored for now.

Executive Summary & Recommendation

We recommend accepting the approach of PR 3017 with improvements, and rejecting PR 3034.

PR 3034 is incomplete. It only prevents the crash in find_package(), but fails later in exclude() when lints are found, leading to a different crash. The regression test added in PR 3034 actually fails.
PR 3017 is a complete and robust solution that introduces a clean separation between file content and file identity. It allows full feature support (config discovery, exclusions, knitr detection) for unsaved buffers.
Proposed Improvement to PR 3017: PR 3017 resolves empty filename "" to the directory getwd(), which causes lints to be attributed to the directory itself in the output. We propose mapping "" to "untitled" internally in lint(). This resolves to a file path (e.g., /path/to/project/untitled) which is cleaner and safer for IDE mapping.

Detailed Analysis of PR 3034

Approach

PR 3034 adds a guard at the beginning of find_package() to return NULL if the path is empty:

find_package <- function(path, allow_rproj = FALSE, max_depth = 2L) {
+  if (!nzchar(path)) {
+    return(NULL)
+  }
   path <- normalize_path(path, mustWork = !allow_rproj)

Assessment

Correctness (Low): It is incomplete. If the linting process actually finds any lints, lint() calls exclude() at the end. In exclude(), the empty filename "" is passed to normalize_exclusions(), which treats it as an unnamed exclusion and fails with:
```
Error in `normalize_exclusions()`:
ℹ Full file exclusions must be <character> vectors of length 1.
✖ Items at the following indices are not: 1.
```

Robustness (Low): The regression test added in PR 3034:

test_that("object_length_linter does not crash with empty filename", {
  expect_length(
    lint("", text = "x = 1", linters = assignment_linter(), parse_settings = FALSE),
    1L
  )
})

fails on this branch due to the normalize_exclusions error.

Maintenance Burden (Low): Very small change, but doesn't solve the problem.

Detailed Analysis of PR 3017

Approach

PR 3017 refactors lint() to support "Mode 3": lint(filename, text = content) where filename is the identity of the file (which may not exist on disk) and text is the content to lint.
Key changes:

find_package() tolerance: Changed to use mustWork = FALSE in normalize_path so it returns NULL instead of erroring for non-existent paths.
normalize_identity_path(): Introduced in lint() to resolve relative paths of non-existent files by prepending getwd() before normalizing.
Exclusions Refactoring: Splits normalization of source exclusions (from # nolint comments in the text) and config exclusions (from .lintr). Source exclusions are normalized with normalize_path = FALSE to prevent them from being discarded because the identity file doesn't exist on disk.
Caching Fix: Fixes a bug where inline data caching used inconsistent keys between saving and retrieving.

Assessment

Correctness (High): Fully addresses the crash and ensures all features (caching, exclusions, config discovery) work correctly for non-existent files.
Suitability (High): This is the correct design for IDE integration, as it allows linting unsaved buffers while respecting the package context they belong to.
Robustness (High): Handles relative/absolute paths and non-existent files correctly. All tests pass, including comprehensive new tests for the new mode.
Maintenance Burden (Medium): The changes in exclude.R and lint.R increase complexity, but it is necessary for correctness.
Efficiency (Good): Avoids creating temporary files when text is provided with a filename.

Edge Case: Empty Filename `""`

In PR 3017, if filename is "", it is resolved to getwd() (absolute path of current directory).

Result: Lints are attributed to the directory itself in the output:

/path/to/project:1:3: style: [assignment_linter] Use <- for assignment, not =.
x = 1
  ^

Impact: This is confusing for command-line users and might break IDE mapping (IDE might not be able to map diagnostics to the unsaved buffer if the filename is a directory).

Proposed Path Forward

We should adopt PR 3017 but apply the following modifications:

Map "" to "untitled" in lint():
At the beginning of lint(), map empty filename to a dummy filename like "untitled":
```
if (!missing(filename) && !nzchar(filename)) {
  filename <- "untitled"
}
```
This ensures that:
- The filename is resolved to /path/to/project/untitled (a file, not a directory).
- The output shows untitled (or absolute path to it), which is cleaner.
- All other PR 3017 logic (config discovery starting from getwd(), exclusions, etc.) still works.
- We tested this modification, and it successfully resolves the directory output issue while passing all tests.
Keep the find_package guard (Optional but recommended):
Add the !nzchar guard from PR 3034 to find_package as defensive programming, in case it is called with "" from other entry points.

Code Diff for Proposed Improvement (relative to PR 3017)

diff --git a/R/lint.R b/R/lint.R
index b4ef2251..df719036 100644
--- a/R/lint.R
+++ b/R/lint.R
@@ -47,6 +47,9 @@ lint <- function(filename, linters = NULL, ..., cache = FALSE,
   dot_names <- if (getRversion() %in% c("4.1.1", "4.1.2")) names(list(...)) else ...names()
   check_dots(dot_names, c("exclude", "parse_exclusions"))
 
+  if (!missing(filename) && !nzchar(filename)) {
+    filename <- "untitled"
+  }
   needs_tempfile <- missing(filename) || re_matches(filename, rex(newline))
   inline_data <- !is.null(text) || needs_tempfile
 
diff --git b/R/settings_utils.R b/R/settings_utils.R
index d9327e79..262071f1 100644
--- a/R/settings_utils.R
+++ b/R/settings_utils.R
@@ -8,6 +8,9 @@ has_rproj <- function(path) {
 }
 
 find_package <- function(path, allow_rproj = FALSE, max_depth = 2L) {
+  if (!nzchar(path)) {
+    return(NULL)
+  }
   path <- normalize_path(path, mustWork = FALSE)
   if (allow_rproj) {
     found <- function(path) has_description(path) || has_rproj(path)

MichaelChirico · 2026-06-04T18:42:29Z

-The latter (inline data) applies whenever \code{filename} has a newline character (\\n).}
+The latter (inline data) applies whenever \code{filename} has a newline character (\\n).
+When combined with \code{text}, \code{filename} serves as the file identity for config discovery, exclusions,
+knitr detection, and output — the file need not exist on disk.}


non-ASCII — here too

MichaelChirico · 2026-06-04T18:45:17Z

-  if (is.null(file)) on.exit(unlink(file), add = TRUE)
-  file <- maybe_write_content(file, content) # NB: the lint consistency fuzz suite anchors here.
+  if (!is.null(file) && !missing(content) && is.character(content)) {
+    # Both file and content → Mode 3 (identity-aware linting)


non-ASCII →

MichaelChirico · 2026-06-04T18:45:44Z

-
-  exclusions <- normalize_exclusions(c(source_exclusions, exclusions))
+  # Source exclusions use already-normalized paths from source_expression$filename.
+  # normalize_path=FALSE avoids Sys.glob/file.exists which fail for non-existent files (Mode 3).


somewhat odd to refer to Mode 3 in this file, it's a pretty "internal" terminology. I think we can just omit the parenthetical.

MichaelChirico · 2026-06-04T18:46:48Z

+  source_exclusions <- normalize_exclusions(source_exclusions, normalize_path = FALSE)
+  config_exclusions <- normalize_exclusions(exclusions)
+  exclusions <- c(source_exclusions, config_exclusions) |>
+    remove_file_duplicates() |>


I'm not following these remove_*_duplicates?

MichaelChirico · 2026-06-04T18:47:43Z


-  if (is.null(file)) on.exit(unlink(file), add = TRUE)
-  file <- maybe_write_content(file, content) # NB: the lint consistency fuzz suite anchors here.
+  if (!is.null(file) && !missing(content) && is.character(content)) {


do we need to support this in expect_lint()? Wouldn't it be better to keep expect_lint() simpler?

MichaelChirico · 2026-06-04T18:51:39Z

  class(lints) <- c("lints", "list")

-  cache_file(lint_cache, filename, linters, lints)
+  cache_file(lint_cache, lint_obj, linters, lints)


this change is puzzling too -- the helper is cache_file() and the argument is filename, now we're passing lint_obj. I think it's fine since this is all just digest() adornment, but maybe we should rename some internals for clarity?

atusy marked this pull request as draft February 26, 2026 12:31

atusy force-pushed the feature/lint-text-with-identity branch from 6d1bb83 to ab8a81f Compare February 28, 2026 23:22

atusy marked this pull request as ready for review March 1, 2026 11:32

atusy marked this pull request as draft March 1, 2026 11:34

atusy marked this pull request as ready for review March 1, 2026 11:38

atusy force-pushed the feature/lint-text-with-identity branch from 388c8c2 to f6dbf7c Compare March 2, 2026 23:56

atusy added 5 commits March 6, 2026 00:46

add NEWS.md entries for Mode 3 lint() and expect_lint() changes

a44ca02

atusy force-pushed the feature/lint-text-with-identity branch from f6dbf7c to a44ca02 Compare March 5, 2026 15:46

MichaelChirico mentioned this pull request Mar 30, 2026

fix: guard find_package() against empty filename path #3034

Closed

Merge branch 'main' into feature/lint-text-with-identity

dfe37ad

MichaelChirico reviewed Jun 4, 2026

View reviewed changes

Conversation

atusy commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Three modes

Example

Changes

Test plan

Uh oh!

MichaelChirico commented Feb 26, 2026

Uh oh!

atusy commented Feb 26, 2026

Uh oh!

codecov Bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

atusy commented Mar 1, 2026

Uh oh!

MichaelChirico commented Jun 4, 2026

Uh oh!

MichaelChirico Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

MichaelChirico commented Jun 4, 2026

Executive Summary & Recommendation

Detailed Analysis of PR 3034

Approach

Assessment

Detailed Analysis of PR 3017

Approach

Assessment

Edge Case: Empty Filename ""

Proposed Path Forward

Code Diff for Proposed Improvement (relative to PR 3017)

Uh oh!

MichaelChirico Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

MichaelChirico Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

MichaelChirico Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

MichaelChirico Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

MichaelChirico Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

MichaelChirico Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

atusy commented Feb 26, 2026 •

edited

Loading

codecov Bot commented Feb 28, 2026 •

edited

Loading

Edge Case: Empty Filename `""`