If you get data compressed as bzip2, whether through HTTP or from a file, it's convenient to decompress it in SQL.
pg_bzip
does that, it provides functions to decompress and compress data using bzip2.
-
bzcat(data bytea) returns bytea
This function mimics the bzcat command, which decompresses data using bzip2.
For this example, we'll use the native pg_read_binary_file to read from a file.
select convert_from(bzcat(pg_read_binary_file('/path/to/all_movies.csv.bz2')), 'utf8') as contents; contents -------------------------------------------------------------------------------------------------------------------------------------------- "id","name","parent_id","date" + "2","Ariel","8384","1988-10-21" + "3","Varjoja paratiisissa","8384","1986-10-17" + "4","État de siège",\N,"1972-12-30" + "5","Four Rooms",\N,"1995-12-22" + "6","Judgment Night",\N,"1993-10-15" + "8","Megacities - Life in Loops",\N,"2006-01-01" + "9","Sonntag, im August",\N,"2004-09-22" + "11","Star Wars: Episode IV – A New Hope","10","1977-05-25" + "12","Finding Nemo","112246","2003-05-30" + ... .... .....
-
bzip2(data bytea, compression_level int default 9) returns bytea
This function is a simplified version of the bzip2 command. It compresses data using bzip2.
For this example we'll use
fio_writefile
from pgsql-fio, which offers a convenient way to write a file from SQL.select fio_writefile('/path/to/my_text.bz2', bzip2(repeat('my text to be compressed', 1000)::bytea)) as writesize; writesize ----------- 109
bzip2 is required. Under Debian/Ubuntu you can get it with
sudo apt install libbz2-dev
Then on this repo
make && make install
Now on SQL you can do:
CREATE EXTENSION bzip;
pg_bzip
is tested to work on PostgreSQL >= 12.
Nix is used to get an isolated and reproducible enviroment with multiple postgres versions.
# enter the Nix environment
$ nix-shell
# to run the tests
$ with-pg-16 make installcheck
# to interact with the isolated pg
$ with-pg-16 psql
# you can choose the pg version
$ with-pg-15 psql