Skip to content

Conversation

@daohoangson
Copy link
Owner

Summary

  • Add support for parsing Claude Code conversation logs (.jsonl files) with the new claude-code provider
  • Enable url2md and url2json commands to accept local file paths in addition to URLs
  • Merge consecutive messages from the same sender for cleaner markdown output
  • Fix stdout piping issues when using commands with tee or other pipes

Changes

  • New provider: src/providers/claude-code/ - parses JSONL format with Valibot schemas
  • Local path support: Detect .jsonl files and route to claude-code provider
  • Tool rendering: Custom markdown formatting for Read, Write, Edit, Bash, Glob, Grep, TodoWrite, Task, WebFetch, WebSearch
  • Content cleaning: Strip <system-reminder> and other system tags from output

Test plan

  • Build passes (npm run build)
  • Parse example JSONL file successfully
  • Output pipes correctly through tee without errors

Dao Hoang Son and others added 4 commits November 26, 2025 13:54
- Add claude-code provider to parse Claude Code JSONL conversation logs
- Support local file paths in url2json and url2md commands
- Type-safe Valibot schemas for JSONL format (user, assistant, system, queue-operation)
- Markdown renderer with tool-specific formatting (Read, Write, Edit, Bash, Glob, Grep, TodoWrite, Task, etc.)
- Strip system instructions and reminders from output

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Track last sender to avoid repeating section headers when the same
participant sends multiple messages in a row.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Remove unused fields from schemas - only keep fields actually used in
rendering. Use looseObject to allow extra fields without validation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Use process.stdout.write() instead of writeFileSync(process.stdout.fd)
to avoid "resource temporarily unavailable" errors when piping large
outputs through commands like tee.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for parsing Claude Code conversation logs stored in local .jsonl files, extending the existing url2md and url2json commands to accept both URLs and local file paths.

Key changes:

  • Introduced a new claude-code provider that parses JSONL-formatted conversation logs with Valibot schema validation
  • Extended url2md and url2json commands to detect and handle local file paths in addition to URLs
  • Implemented custom markdown rendering for Claude Code tool use (Read, Write, Edit, Bash, Glob, Grep, TodoWrite, Task, WebFetch, WebSearch)

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
src/providers/claude-code/models.ts Defines Valibot schemas for validating Claude Code JSONL line types (user, assistant, system, queue-operation) and their content structures
src/providers/claude-code/markdown.ts Implements markdown rendering logic for Claude Code conversations with tool use formatting and system tag filtering
src/providers/claude-code/index.ts Provides functions to parse JSONL files and render them to markdown
src/providers/index.ts Routes local file paths to the claude-code provider and adds claude-code to the provider picklist
src/common/url.ts Adds Provider type, local path detection logic, and provider detection by file extension
src/commands/url2md.ts Updates to accept local file paths and fixes stdout handling for proper piping
src/commands/url2json.ts Updates to accept local file paths with appropriate routing logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +259 to +260
yml: "yaml",
md: "markdown",
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Incomplete file extension mapping. The extMap only includes a small subset of common extensions. Files with other extensions like .go, .rs, .java, .cpp, .c, .sh, .json, .xml, .html, .css, etc. will fall back to their raw extension, which may not be recognized by markdown renderers.

Consider either:

  1. Expanding the mapping to include more common extensions
  2. Documenting that only these specific extensions are mapped
  3. Using a library like linguist-languages for comprehensive language detection
Suggested change
yml: "yaml",
md: "markdown",
yml: "yaml",
yaml: "yaml",
md: "markdown",
go: "go",
rs: "rust",
java: "java",
cpp: "cpp",
cc: "cpp",
cxx: "cpp",
c: "c",
h: "c",
cs: "csharp",
sh: "bash",
bash: "bash",
json: "json",
xml: "xml",
html: "html",
htm: "html",
css: "css",
scss: "scss",
less: "less",
php: "php",
swift: "swift",
kt: "kotlin",
kts: "kotlin",
scala: "scala",
hs: "haskell",
pl: "perl",
lua: "lua",
dart: "dart",
r: "r",
m: "objective-c",
sql: "sql",
coffee: "coffeescript",
// add more as needed

Copilot uses AI. Check for mistakes.
Comment on lines +148 to +150
const oldStr = typedInput.old_string.replace(/\n/g, "\n-");
const newStr = typedInput.new_string.replace(/\n/g, "\n+");
parts.push(`\`\`\`diff\n-${oldStr.trimEnd()}\n+${newStr.trimEnd()}\n\`\`\``);
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff formatting for the Edit tool will produce incorrect output when the old_string or new_string contains multiple lines. The current logic prepends - or + to newlines within the string, but the first line doesn't get the prefix. For example, a 3-line old_string would result in:

-line1
-line2
-line3

But it should be:

-line1
-line2
-line3

Consider using:

const oldStr = typedInput.old_string.split('\n').map(line => `-${line}`).join('\n');
const newStr = typedInput.new_string.split('\n').map(line => `+${line}`).join('\n');
parts.push(`\`\`\`diff\n${oldStr.trimEnd()}\n${newStr.trimEnd()}\n\`\`\``);
Suggested change
const oldStr = typedInput.old_string.replace(/\n/g, "\n-");
const newStr = typedInput.new_string.replace(/\n/g, "\n+");
parts.push(`\`\`\`diff\n-${oldStr.trimEnd()}\n+${newStr.trimEnd()}\n\`\`\``);
const oldStr = typedInput.old_string
.split('\n')
.map(line => `-${line}`)
.join('\n');
const newStr = typedInput.new_string
.split('\n')
.map(line => `+${line}`)
.join('\n');
parts.push(`\`\`\`diff\n${oldStr.trimEnd()}\n${newStr.trimEnd()}\n\`\`\``);

Copilot uses AI. Check for mistakes.
Comment on lines 121 to 240
function renderToolUseContent(
ctx: RenderContext,
parts: string[],
content: ToolUseContent,
): void {
const { name, input, id } = content;

switch (name) {
case "Read": {
const typedInput = input as { file_path: string };
parts.push(`## Read \`${typedInput.file_path}\``);
break;
}
case "Write": {
const typedInput = input as { file_path: string; content: string };
parts.push(`## Write \`${typedInput.file_path}\``);
const ext = getFileExtension(typedInput.file_path);
parts.push(`\`\`\`${ext}\n${typedInput.content.trim()}\n\`\`\``);
break;
}
case "Edit": {
const typedInput = input as {
file_path: string;
old_string: string;
new_string: string;
};
parts.push(`## Edit \`${typedInput.file_path}\``);
const oldStr = typedInput.old_string.replace(/\n/g, "\n-");
const newStr = typedInput.new_string.replace(/\n/g, "\n+");
parts.push(`\`\`\`diff\n-${oldStr.trimEnd()}\n+${newStr.trimEnd()}\n\`\`\``);
break;
}
case "Bash": {
const typedInput = input as { command: string; description?: string };
const desc = typedInput.description
? `: ${typedInput.description}`
: "";
parts.push(`## Bash${desc}`);
parts.push(`\`\`\`bash\n${typedInput.command.trim()}\n\`\`\``);
renderToolResultIfExists(ctx, parts, id);
break;
}
case "Glob":
case "Grep": {
const typedInput = input as { pattern: string; path?: string };
const pathStr = typedInput.path ? ` in \`${typedInput.path}\`` : "";
parts.push(`## ${name}: \`${typedInput.pattern}\`${pathStr}`);
renderToolResultIfExists(ctx, parts, id);
break;
}
case "TodoWrite": {
const typedInput = input as {
todos: Array<{ content: string; status: string }>;
};
parts.push("## Todos");
const todoLines = typedInput.todos.map((todo) => {
const checkbox =
todo.status === "completed"
? "[x]"
: todo.status === "in_progress"
? "[~]"
: "[ ]";
return `- ${checkbox} ${todo.content}`;
});
parts.push(todoLines.join("\n"));
break;
}
case "Task": {
const typedInput = input as {
description: string;
prompt: string;
subagent_type?: string;
};
parts.push(`## Task: ${typedInput.description}`);
if (typedInput.subagent_type) {
parts.push(`Agent: ${typedInput.subagent_type}`);
}
parts.push(`\`\`\`\n${typedInput.prompt.trim()}\n\`\`\``);
break;
}
case "WebFetch":
case "WebSearch": {
const typedInput = input as { url?: string; query?: string };
if (typedInput.url) {
parts.push(`## ${name}: ${typedInput.url}`);
} else if (typedInput.query) {
parts.push(`## ${name}: ${typedInput.query}`);
} else {
parts.push(`## ${name}`);
}
break;
}
default: {
const str =
typeof input === "string" ? input : JSON.stringify(input, null, 2);
parts.push(`## Tool use: ${name}`);
parts.push(`\`\`\`\n${str.trim()}\n\`\`\``);
break;
}
}
}

function renderToolResultIfExists(
ctx: RenderContext,
parts: string[],
toolUseId: string,
): void {
const result = ctx.toolResults.get(toolUseId);
if (!result) return;

const content = result.content;
if (typeof content === "string") {
const cleanedContent = cleanToolResultContent(content);
if (cleanedContent.trim()) {
parts.push("<details><summary>Output</summary>");
parts.push(`\`\`\`\n${cleanedContent.trim()}\n\`\`\``);
parts.push("</details>");
}
}
}
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential XSS vulnerability: User-provided content from file paths, tool inputs, and tool results is directly interpolated into markdown without sanitization. Malicious JSONL files could inject arbitrary HTML/JavaScript through fields like file_path, description, url, query, or tool result content.

For example:

  • A file path like `<script>alert('xss')</script>` would be rendered directly in markdown
  • Tool result content could contain malicious HTML that gets rendered when the markdown is converted to HTML

Consider sanitizing user-provided content before embedding it in markdown, especially for fields that will be rendered as HTML or could contain special characters.

Copilot uses AI. Check for mistakes.
import { parseSchemaOrThrow } from "@/common";
import { renderFromLines } from "./markdown";
import { jsonlLineSchema, type JsonlLine } from "./models";

Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing JSDoc documentation for the public API function. This function is exported and used by other modules, so it should have documentation explaining:

  • What the function does
  • What the path parameter represents (file path to a .jsonl file)
  • What it returns (array of parsed JSONL lines)
  • What errors it may throw (file not found, invalid JSON, schema validation errors)

Example:

/**
 * Parses a Claude Code JSONL file from the given file path.
 * @param filePath - Path to the .jsonl file containing Claude Code conversation logs
 * @returns Array of validated JSONL line objects
 * @throws {Error} If file cannot be read, JSON is malformed, or schema validation fails
 */
export function parseJsonlFromPath(filePath: string): JsonlLine[] {
Suggested change
/**
* Parses a Claude Code JSONL file from the given file path.
* @param filePath - Path to the .jsonl file containing Claude Code conversation logs
* @returns Array of validated JSONL line objects
* @throws {Error} If file cannot be read, JSON is malformed, or schema validation fails
*/

Copilot uses AI. Check for mistakes.
Comment on lines 235 to 236
parts.push("<details><summary>Output</summary>");
parts.push(`\`\`\`\n${cleanedContent.trim()}\n\`\`\``);
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing blank line within details block could cause rendering issues. HTML <details> blocks in markdown should have blank lines before and after the code block for proper markdown parsing. The current implementation might not render correctly in some markdown processors.

Consider:

parts.push("<details><summary>Output</summary>");
parts.push("");  // blank line
parts.push(`\`\`\`\n${cleanedContent.trim()}\n\`\`\``);
parts.push("");  // blank line
parts.push("</details>");
Suggested change
parts.push("<details><summary>Output</summary>");
parts.push(`\`\`\`\n${cleanedContent.trim()}\n\`\`\``);
parts.push("<details><summary>Output</summary>");
parts.push(""); // blank line
parts.push(`\`\`\`\n${cleanedContent.trim()}\n\`\`\``);
parts.push(""); // blank line

Copilot uses AI. Check for mistakes.
toolResults: Map<string, ToolResultContent>;
lastSender: Sender;
}

Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing JSDoc documentation for the complex rendering function. Given the complexity of this function with multiple tool types and rendering logic, it should have documentation explaining:

  • The purpose of the function
  • The structure of the lines parameter
  • What markdown format is returned
  • Any special rendering rules or transformations applied

Example:

/**
 * Renders an array of Claude Code JSONL lines into markdown format.
 * Merges consecutive messages from the same sender and formats tool uses with appropriate code blocks.
 * @param lines - Array of parsed JSONL line objects from a Claude Code conversation
 * @returns Markdown string representation of the conversation
 */
export function renderFromLines(lines: JsonlLine[]): string {
Suggested change
/**
* Renders an array of Claude Code JSONL lines into a markdown conversation transcript.
*
* This function processes the input `lines` in two passes:
* 1. Collects all tool results from user messages for later reference.
* 2. Renders user and assistant messages into markdown, merging consecutive messages from the same sender,
* and formatting tool uses and tool results with appropriate code blocks.
*
* @param lines - Array of parsed JSONL line objects from a Claude Code conversation. Each line may represent
* a user or assistant message, and may include text, tool uses, or tool results.
* @returns A markdown string representing the conversation, with sender headers, merged messages,
* and formatted tool uses/results. System instructions wrapped in XML tags are skipped.
*
* Special rendering rules:
* - Consecutive messages from the same sender are merged under a single header.
* - Tool uses and tool results are formatted as code blocks with language tags.
* - System instructions in user messages (wrapped in XML tags) are omitted from the output.
*/

Copilot uses AI. Check for mistakes.
return content.replace(/<system-reminder>[\s\S]*?<\/system-reminder>/g, "").trim();
}

function getFileExtension(filePath: string): string {
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The function name getFileExtension is misleading. The function doesn't just get the file extension—it also maps common extensions to their full language names (e.g., "ts" → "typescript"). A more accurate name would be getLanguageIdentifier, getSyntaxHighlighter, or getCodeBlockLanguage to reflect that it returns a language identifier for markdown code blocks, not just the raw file extension.

Suggested change
function getFileExtension(filePath: string): string {
function getCodeBlockLanguage(filePath: string): string {

Copilot uses AI. Check for mistakes.
Comment on lines 128 to 220
switch (name) {
case "Read": {
const typedInput = input as { file_path: string };
parts.push(`## Read \`${typedInput.file_path}\``);
break;
}
case "Write": {
const typedInput = input as { file_path: string; content: string };
parts.push(`## Write \`${typedInput.file_path}\``);
const ext = getFileExtension(typedInput.file_path);
parts.push(`\`\`\`${ext}\n${typedInput.content.trim()}\n\`\`\``);
break;
}
case "Edit": {
const typedInput = input as {
file_path: string;
old_string: string;
new_string: string;
};
parts.push(`## Edit \`${typedInput.file_path}\``);
const oldStr = typedInput.old_string.replace(/\n/g, "\n-");
const newStr = typedInput.new_string.replace(/\n/g, "\n+");
parts.push(`\`\`\`diff\n-${oldStr.trimEnd()}\n+${newStr.trimEnd()}\n\`\`\``);
break;
}
case "Bash": {
const typedInput = input as { command: string; description?: string };
const desc = typedInput.description
? `: ${typedInput.description}`
: "";
parts.push(`## Bash${desc}`);
parts.push(`\`\`\`bash\n${typedInput.command.trim()}\n\`\`\``);
renderToolResultIfExists(ctx, parts, id);
break;
}
case "Glob":
case "Grep": {
const typedInput = input as { pattern: string; path?: string };
const pathStr = typedInput.path ? ` in \`${typedInput.path}\`` : "";
parts.push(`## ${name}: \`${typedInput.pattern}\`${pathStr}`);
renderToolResultIfExists(ctx, parts, id);
break;
}
case "TodoWrite": {
const typedInput = input as {
todos: Array<{ content: string; status: string }>;
};
parts.push("## Todos");
const todoLines = typedInput.todos.map((todo) => {
const checkbox =
todo.status === "completed"
? "[x]"
: todo.status === "in_progress"
? "[~]"
: "[ ]";
return `- ${checkbox} ${todo.content}`;
});
parts.push(todoLines.join("\n"));
break;
}
case "Task": {
const typedInput = input as {
description: string;
prompt: string;
subagent_type?: string;
};
parts.push(`## Task: ${typedInput.description}`);
if (typedInput.subagent_type) {
parts.push(`Agent: ${typedInput.subagent_type}`);
}
parts.push(`\`\`\`\n${typedInput.prompt.trim()}\n\`\`\``);
break;
}
case "WebFetch":
case "WebSearch": {
const typedInput = input as { url?: string; query?: string };
if (typedInput.url) {
parts.push(`## ${name}: ${typedInput.url}`);
} else if (typedInput.query) {
parts.push(`## ${name}: ${typedInput.query}`);
} else {
parts.push(`## ${name}`);
}
break;
}
default: {
const str =
typeof input === "string" ? input : JSON.stringify(input, null, 2);
parts.push(`## Tool use: ${name}`);
parts.push(`\`\`\`\n${str.trim()}\n\`\`\``);
break;
}
}
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Magic strings for tool names are scattered throughout the switch statement. If tool names change or new tools are added, these strings must be updated in multiple places. Consider defining these as constants at the top of the file or in a separate constants module:

const TOOL_NAMES = {
	READ: 'Read',
	WRITE: 'Write',
	EDIT: 'Edit',
	BASH: 'Bash',
	GLOB: 'Glob',
	GREP: 'Grep',
	TODO_WRITE: 'TodoWrite',
	TASK: 'Task',
	WEB_FETCH: 'WebFetch',
	WEB_SEARCH: 'WebSearch',
} as const;

Then use case TOOL_NAMES.READ: instead of case "Read":.

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +30
export function getProviderByPath(path: string): Provider | undefined {
if (path.endsWith(".jsonl")) {
return "claude-code";
}
return;
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The provider detection logic is too simplistic. Any file ending with .jsonl is assumed to be a Claude Code file. This could lead to false positives if:

  1. Other providers also use .jsonl format
  2. Users have unrelated .jsonl files that don't conform to the Claude Code schema

Consider adding additional validation beyond just the file extension, such as:

  • Checking for Claude Code-specific markers in the first few lines
  • Attempting to parse with the schema and falling back to error if it doesn't match
  • Documenting that .jsonl files must be Claude Code format

Copilot uses AI. Check for mistakes.
Comment on lines 32 to 35
.positional("url", {
type: "string",
description: "URL or local file path (e.g., .jsonl for Claude Code)",
demandOption: true,
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The positional parameter name is still url but now also accepts local file paths. Consider renaming the parameter to urlOrPath or source to better reflect that it accepts both URLs and local file paths. This would make the API more intuitive:

command: "url2md <source>",
// ...
.positional("source", {
	type: "string",
	description: "URL or local file path (e.g., .jsonl for Claude Code)",
	demandOption: true,
})

Copilot uses AI. Check for mistakes.
Dao Hoang Son and others added 9 commits December 1, 2025 12:34
Support PDF and other document attachments in user messages.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Support Claude's extended thinking feature which includes thinking
blocks in assistant messages. Renders as collapsible details section.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Shows the model being used (e.g., Sonnet 4.5, Opus, Haiku) in the
assistant header. Also displays a new header when the model changes
mid-conversation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Display token usage statistics at the end of the conversation:
- Input tokens
- Output tokens
- Cache creation tokens
- Cache read tokens

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Calculate estimated cost based on model pricing:
- Haiku 3.5: $0.80/$4 per MTok
- Sonnet 4/4.5: $3/$15 per MTok
- Opus 4.5: $15/$75 per MTok

Includes cache write/read pricing. Note in output that
pricing is based on Dec 2024 rates and may be outdated.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Haiku 4.5: $1/$5 per MTok (was $0.80/$4)
Cache write: 1.25x base, cache read: 0.1x base

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Opus 4.5: $5/$25 per MTok (not $15/$75)
Cache: $6.25 write, $0.50 read

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Support the file-history-snapshot line type that Claude Code uses
for tracking file state during conversations. This type is skipped
during markdown rendering like other internal bookkeeping types.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Mask sensitive information in rendered markdown:
- Replace cwd with "." for relative paths
- Replace $HOME with ~ for home directory paths
- Replace username with <user> for any remaining occurrences

This prevents personal paths and usernames from appearing in
shared conversation exports.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants