This was my second 42 project. It replicates getline: each call returns the next line from a file, reading only until a full line is found and keeping any extra bytes for the next call. The goal is to minimize reads and keep the stored buffer as empty as possible, never reading the entire file at once. The buffer size is indeterminate at compile time and is selected with CC options by defining BUFFER_SIZE, while storage relies on static variables and malloc/free without memory leaks. When the file ends and nothing remains buffered, it returns NULL.
- The use of static variables.
- The scope and lifetime of static variables.
- How static variables are handled in a dependency scheme.
- File handling basics.
- What file descriptors are and how the system manages them.
- Using
open,read, andcloseto manage file descriptors. - What memory leaks are, how to detect them, and which tools are available.
- Managing an I/O system with an intermediate buffer that acts as a data store using a queue.
- Why understanding data structures matters.
- Memory linearity.
voidpointers and casting.
# Example (Debian/Ubuntu)
sudo apt-get update
sudo apt-get install gcc
sudo apt-get install make
sudo apt-get install valgrindgit clone https://github.com/alcarril/get_next_line.git # Clone the repository
cd get_next_line # Go to the project directory
make # Build the project and run the default test#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include "get_next_line.h"
int main(void)
{
int fd = open("test.txt", O_RDONLY);
char *line;
if (fd < 0)
return 1;
while ((line = get_next_line(fd)) != NULL)
{
printf("%s", line);
free(line);
}
close(fd);
return 0;
}cc main.c src/get_next_line.c src/get_next_line_utils.c -o gnlvalgrind --leak-check=yes --track-fds=yes ./gnlBecause the read size is chosen at compile time with BUFFER_SIZE, the function must avoid infinite reads and keep the static buffer as empty as possible between calls. Before each read, it first checks the stored buffer for a newline; after read, it checks the newly read chunk again and only keeps what is needed for the next call. Each returned line is copied into a dedicated string that the caller must free.
The naive approach scans every line multiple times to find \n and to rebuild the output. My approach minimizes repeated scans by splitting and storing lines up front using a control structure. Since I did not use structs at the time, I modeled each file descriptor with a table: one control area to keep line lengths (so the newline position is known without rescanning), another to track whether the file is finished, and another to track how many lines are currently stored. This lets each line be read once, then returned by popping the first stored line and dispatching the queue on the next call.
The line-count control also helps reduce reallocations: if the table still has free slots from previously returned lines, it reuses them instead of growing the table every time.
Note: 📝 Project explanation: GNL 4D o Estructura de void.
When I started this project I did not know how to use C structs or linked
lists. Because I wanted control variables and a custom layout, I relied on
void pointers and casts, but in practice it is much less complex to
structure code with structs and that is what they are for.
On the other hand, using void pointers does keep memory linear, which
can be faster for the CPU. In later Common Core projects I learned that C++
uses similar techniques to implement vectors internally.
For massive files, this approach is up to 10x faster thanks to memory linearity, but this function is not meant to be used specifically for that scenario.
Note: 📝 Reddit discussion: Why triple pointers are not convenient.
man getlineRTFM- Static variables and variable scope: UC3M course notes
- Open, read, close, and file descriptor handling: WhileTrueThenDream video
- General project documentation: Notion documentation
- Alejandro Carrillo (alcarril) - https://github.com/alcarril