Releases · nao1215/filesql

Added

Custom Logger Support: Flexible logging system with slog integration
- Logger interface: Simple logging interface with Debug, Info, Warn, Error, and With methods
- ContextLogger interface: Extended logging interface with context-aware methods (DebugContext, InfoContext, WarnContext, ErrorContext)
- NewSlogAdapter(): Adapter to use standard library slog.Logger with filesql's Logger interface
- NewSlogContextAdapter(): Adapter for context-aware logging with slog.Logger
- WithLogger(): Builder method to inject custom logger into the build and open process
- nopLogger: Zero-overhead no-op logger implementation used as default (benchmarked at ~0.2 ns/op)
- Logging throughout build, validation, and database opening operations
- Comprehensive test coverage and benchmarks for all logger implementations

Changed

Documentation Updates: Added Custom Logger section to all README files (7 languages: EN, ES, FR, JA, KO, RU, ZH-CN)
- Usage examples with slog integration
- Logger and ContextLogger interface definitions
- Performance benchmark comparison table

Added

Read-Only Database Mode: New ReadOnlyDB wrapper for safe read-only access to databases
- NewReadOnlyDB(db): Wraps existing *sql.DB to prevent write operations
- ReadOnlyDB.Query(), QueryContext(), QueryRow(), QueryRowContext(): Read operations work normally
- ReadOnlyDB.Exec(), ExecContext(): Returns ErrReadOnly for write operations (INSERT, UPDATE, DELETE, DROP, ALTER, CREATE, TRUNCATE, REPLACE, UPSERT)
- ReadOnlyDB.Prepare(), PrepareContext(): Rejects preparation of write statements
- ReadOnlyDB.Begin(), BeginTx(): Returns ReadOnlyTx for read-only transactions
- ReadOnlyDB.Ping(), PingContext(), Close(), DB(): Standard database operations
- ReadOnlyStmt: Read-only prepared statement wrapper
- ReadOnlyTx: Read-only transaction wrapper with same protections
- DBBuilder.OpenReadOnly(ctx): Convenience method to open database in read-only mode
- ErrReadOnly: Sentinel error for rejected write operations
- Useful for audit scenarios where data viewing without modification risk is required
ACHTableInfo Struct: New struct for managing ACH table name information
- ACHTableInfo.BaseName: The base table name derived from ACH filename
- ACHTableInfo.FileHeaderTable(): Returns {baseName}_file_header
- ACHTableInfo.BatchesTable(): Returns {baseName}_batches
- ACHTableInfo.EntriesTable(): Returns {baseName}_entries
- ACHTableInfo.AddendaTable(): Returns {baseName}_addenda
- ACHTableInfo.IATBatchesTable(): Returns {baseName}_iat_batches
- ACHTableInfo.IATEntriesTable(): Returns {baseName}_iat_entries
- ACHTableInfo.IATAddendaTable(): Returns {baseName}_iat_addenda
- ACHTableInfo.AllTableNames(): Returns all possible table names for the base name
- GetACHTableInfos(): Returns []ACHTableInfo for all registered ACH files

Changed

Internal ACH Function: Made GetACHBaseTableNames private (getACHBaseTableNames) as it was only used internally
- Use GetACHTableInfos() for public access to ACH table information

Added

New Compression Formats: Added support for 4 new compression formats via fileparser v0.2.0
- zlib (.z) - Standard DEFLATE compression
- snappy (.snappy) - Google's high-speed compression
- s2 (.s2) - Improved Snappy extension, faster
- lz4 (.lz4) - Extremely fast compression

Changed

Migrated from internal github.com/nao1215/filesql/parser to external github.com/nao1215/fileparser for file parsing
Updated all internal references from parser. to fileparser.

Removed

Internal parser package (now using github.com/nao1215/fileparser v0.1.0 as external dependency)

Added

Public Parser Package (6271e5ef): Exposed the internal parser as a public API for use in external projects
- New parser package: Standalone file parsing without SQLite dependency
  - parser.Parse(): Parse CSV, TSV, LTSV, XLSX, and Parquet files from io.Reader
  - parser.DetectFileType(): Automatic file type detection from file path
  - parser.BaseFileType(): Get base file type from potentially compressed file types
- Type exports: TableData, ColumnType, FileType types for working with parsed data
- Parquet support: Full Parquet parsing with parser/parquet.go
- XLSX support: Excel file parsing with parser/xlsx.go
- Comprehensive test coverage: 90%+ coverage for the parser package
ORM Integration Examples (281ede2): Added example code for popular Go ORMs and query builders
- GORM: Full GORM integration example with model definitions
- Bun: Bun ORM example with struct scanning
- Ent: Facebook's Ent framework example with generated code
- sqlx: sqlx example with struct tags
- sqlc: sqlc example with generated type-safe queries
- Squirrel: Squirrel query builder example
- Basic: Standard library database/sql example
- Multi-format: Example combining CSV, TSV, and LTSV files
FileType.String() Method: Added fmt.Stringer implementation for FileType enum
- Human-readable format names for logging and debugging
- Returns names like "CSV", "TSV", "LTSV", "XLSX", "Parquet", etc.

Changed

Documentation Updates: Enhanced README files across all 7 languages
- Added fileprep project reference (e3705a7)
- Fixed project link formatting (dea615e)
- Updated fileprep section (d7905b0)

Technical Details

Architecture: Parser package enables lightweight file parsing without database overhead
Compatibility: Parser package can be used independently of the main filesql package
Testing: Added comprehensive test suites for parser, types, and error handling

Added

Benchmark Tests (2852ea2): Added benchmark infrastructure for performance testing
- New make benchmark target in Makefile for running benchmark tests
- Benchmark tests isolated with //go:build benchmark tag to prevent execution during regular tests
- BenchmarkOpenContext and BenchmarkOpenContextParallel for measuring CSV loading performance

Improved

Major Performance Optimization (d20b3c8, e95a5bf): Significantly improved file loading performance
- 55% faster execution: Reduced 100,000-row CSV loading time from ~960ms to ~430ms
- 12% less memory: Reduced memory usage from ~161MB to ~141MB
- Transaction batching: Wrapped all INSERT operations in a single transaction to reduce SQLite disk sync operations
- Slice reuse: Pre-allocate and reuse value slices in insertChunkData() to reduce allocations
- Pre-allocation in type inference: Optimized newColumnInfoList() and inferColumnsInfo() with pre-allocated column value slices

Fixed

Data Integrity in Chunk Insertion (b191d93): Fixed potential data corruption issues in insertChunkData()
- Stale value prevention: Fixed issue where records with fewer columns than headers could retain stale values from previous rows
- Extra column detection: Added validation to fail fast when records have more columns than headers, preventing silent data truncation

Changed

Documentation Updates (17a42fa): Added benchmark results to all README files (7 languages)
- Performance metrics: ~430ms execution time, ~141MB memory for 100,000-row CSV

Dependencies

github.com/klauspost/compress: 1.18.1 → 1.18.2

Added

Header-Only File Support (PR #67, 5de8801): Files with headers but no data records are now supported
- CSV, TSV, Parquet, and XLSX formats can now be loaded with only header rows
- Creates empty SQLite tables with correct column names (all columns as TEXT type)
- Useful for schema definition files and template files
- Example: A CSV file containing only id,name,age will create a table with those columns but zero rows

Fixed

LTSV Error Handling: Improved error messages for invalid LTSV data
- Now correctly returns "no valid LTSV keys found" error instead of silently creating empty tables
- LTSV format requires key:value pairs, so header-only concept does not apply

Changed

Dependencies: Updated library dependencies
- modernc.org/sqlite: 1.40.0 → 1.40.1
- github.com/klauspost/compress: 1.18.0 → 1.18.1
- github.com/xuri/excelize/v2: 2.9.1 → 2.10.0
- golang.org/x/crypto: Security update
- actions/checkout: 4 → 6

Fixed

Table Name Sanitization: Fixed SQL syntax errors caused by special characters in file names
- Applied sanitizeTableName() to all table name generation paths
- Hyphens, spaces, and special characters are now automatically converted to underscores
- Example: "user-data.csv" → table "user_data", "my file.csv" → table "my_file"
- Updated test expectations to match sanitized table names

Improved

API Documentation: Enhanced documentation for public APIs to clarify table name sanitization
- Updated Open(), OpenContext(), and DBBuilder.Open() method documentation
- Added examples showing special character conversion in table names
- Improved sanitizeTableName() function documentation with detailed transformation rules
Development Experience: Optimized test execution time for local development
- Added GitHub Actions environment checks to skip slow tests locally
- Reduced local test execution time by 63% (from ~55s to ~20s)
- Maintained full test coverage in CI/CD while improving developer productivity

Technical Details

Breaking Change Prevention: Preserved existing tableFromFilePath() behavior for backward compatibility
Test Coverage: Maintained 80.7% test coverage with updated test expectations
Performance: No impact on runtime performance, only development-time improvements

Added

Memory Management System (PR #49, d128a27): Comprehensive memory optimization for large file processing
- Introduced MemoryPool for efficient reuse of byte slices, record slices, and string slices
- Added MemoryLimit with configurable thresholds and graceful degradation
- Implemented automatic memory monitoring with adaptive chunk size reduction
- Enhanced XLSX processing with chunked streaming and memory-optimized operations
- Added comprehensive test coverage (800+ lines) with benchmarks and concurrent access validation
Compression Handler (PR #48, ac04ae9): Factory pattern for file compression handling
- Unified compression/decompression interface supporting gzip, bzip2, xz, and zstd formats
- Clean resource management with automatic cleanup functions
- Comprehensive test suite with end-to-end compression validation
- Performance benchmarks for different compression algorithms

Changed

Architecture Refactoring (PR #47, c228ffd): Split DBBuilder into focused processors following Single Responsibility Principle
- Created dedicated FileProcessor for file-specific operations
- Introduced StreamProcessor for streaming data processing
- Added Validator for centralized validation logic
- Improved code maintainability and testability through separation of concerns
API Breaking Change: Exported Record type (was previously unexported record)
- Fixed lint issues with exported methods returning unexported types
- Added comprehensive documentation for migration guidance

Fixed

Memory Pool Resource Management: Fixed critical backing array tracking issue
- Resolved potential memory corruption when slice capacity exceeded original allocation
- Implemented proper resource cleanup with original slice tracking
Performance Optimization: Reduced runtime.ReadMemStats call frequency
- Changed from every 100 records to every 1000 records (10x performance improvement)
- Added detailed comments explaining the performance trade-offs

Technical Improvements

Enhanced Documentation: Added comprehensive godoc comments for all new types
- MemoryPool and MemoryLimit usage examples and thread safety guarantees
- Performance notes and best practices for memory management
Code Quality: Replaced magic numbers with named constants throughout memory management
Integer Overflow Safety: Enhanced overflow protection with detailed documentation for edge cases
Test Coverage: Maintained 81.2% test coverage with extensive memory management test suite

Fixed

DBBuilder Refactoring (PR #45, 6379425): Major architectural improvements for better maintainability
- Refactored DBBuilder implementation for cleaner code structure
- Improved error handling and validation in builder pattern
- Enhanced code organization and readability

Technical Improvements

LLM Settings Enhancement (PR #44, 2575759): Updated LLM configuration for unit testing
- Improved development workflow with better AI assistance configuration
- Enhanced test environment setup for LLM-powered development tools
Integration Testing Expansion (PR #43, 48eadbe): Added comprehensive integration test coverage
- Enhanced test coverage with real-world usage scenarios
- Improved reliability and robustness validation
Sample Data Addition (PR #41, 0adba40): Added sample CSV files for testing and demonstration
- Enhanced testing capabilities with realistic sample data
- Improved documentation with practical examples

Uh oh!

Releases: nao1215/filesql

v0.10.0

Added

Changed

Uh oh!

v0.9.0

Added

Changed

Uh oh!

v0.8.0

Added

Uh oh!

v0.7.0

Changed

Removed

Uh oh!

v0.6.0

Added

Changed

Technical Details

Uh oh!

v0.5.0

Added

Improved

Fixed

Changed

Dependencies

Uh oh!

v0.4.6

Added

Fixed

Changed

Uh oh!

v0.4.5

Fixed

Improved

Technical Details

Uh oh!

v0.4.4

Added

Changed

Fixed

Technical Improvements

Uh oh!

v0.4.3

Fixed

Technical Improvements

Uh oh!