Skip to content

halloleo/rectangify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rectangify

Create tables from cell lists, i.e. making 1-dimensional lists into 2-dimensional rectangles.

This is often helpful when copying tables from PDFs to text files: Usually each cell appears on its own line; depending on how the PDF was generated, adjacent lines belong either to the same column or to the same row; all columns/rows appear one after another.

Look at the examples to get a feel for typical input files.

Examples

Let's start with a columns list:

$ cat flights_cols_unmarked.txt
Date
29-Jun-17 
05-Jul-17 
12-Aug-17 
13-Aug-17
From - To
Sydney - Beijing 
Beijing - Frankfurt 
Frankfurt - Shanghai 
Shanghai - Sydney
Flight 
EK174 
EK931 
EK736 
EK175
Departs Arrives
19:40 05:30 (30-Jun-17)
14:00 18:15 
20:00 13:05 (13-Aug-17)
19:35 08:10 (14-Aug-17)

This file contains cells on each line, one column after another; each column seems to have 5 items (rows). So let's convert this into a CSV table:

$ rectangify.py -i 5 flights_cols_unmarked.txt 
Date,From - To,Flight,Departs Arrives
29-Jun-17,Sydney - Beijing,EK174,19:40 05:30 (30-Jun-17)
05-Jul-17,Beijing - Frankfurt,EK931,14:00 18:15
12-Aug-17,Frankfurt - Shanghai,EK736,20:00 13:05 (13-Aug-17)
13-Aug-17,Shanghai - Sydney,EK175,19:35 08:10 (14-Aug-17)

Analog for a rows list:

$ cat food_rows.txt 
---
Meal
Is vegetarian?
Prepare time
Taste rating
---
Spaghetti with prawn-chili sauce
no
25min
****
---
Carrot, lentil & orange soup
yes
35min
***
---
Curried tofu with rice
yes
30min
**

Here we have a file where each block of lines is marked by a marker string ("---"). Each block of lines seems to resemble a row of the table. So we can convert it into a CSV table with the following command:

$ rectangify.py --convert ROWS food_rows.txt 
Meal,Is vegetarian?,Prepare time,Taste rating
Spaghetti with prawn-chili sauce,no,25min,****
"Carrot, lentil & orange soup",yes,35min,***
Curried tofu with rice,yes,30min,**

Note: If you want pretty tables for Markdown processing or just to look at, install the powerfull csvkit and pipe rectangify's output through csvlook or use the lighter Python3-only csvtomd. I use here csvlook:

$ rectangify.py -c ROWS food_rows.txt | csvlook
| Meal                             | Is vegetarian? | Prepare time | Taste rating |
| -------------------------------- | -------------- | ------------ | ------------ |
| Spaghetti with prawn-chili sauce |          False |      0:25:00 | ****         |
| Carrot, lentil & orange soup     |           True |      0:35:00 | ***          |
| Curried tofu with rice           |           True |      0:30:00 | **           |

Installation

Make sure you have the wonderful pipx installed. Then clone this repo and run

pipx install <path-to-repo>

Now you should be good to go! Test it by issuing:

$ rectangify.py --help
usage: rectangify [-h] [-c {COLS,ROWS}] [-o OUT] [-m STR] [-i NUM] [-r] [-b]
                  INPUT

create CSV tables from a list of columns/rows

positional arguments:
  INPUT                 input file (or '-' for stdin)

optional arguments:
  -h, --help            show this help message and exit
  -c {COLS,ROWS}, --convert {COLS,ROWS}
                        convert mode to use: 'COLS' (cells of one column are
                        listed together) or 'ROWS' (cells of one row are
                        listed together). (default: 'COLS')
  -o OUT, --out OUT     save output in file OUT instead of stdout
  -m STR, --marker STR  marker line designating the start of a new column/row
                        (without the newline).(default: '---')
  -i NUM, --items NUM   number of items until the next column/row starts (has
                        precedence over marker)
  -r, --reverse-count   count the lines from the end of the file (instead of
                        the beginning) when using the --items option
  -b, --truncate-before
                        ignore lines before the first marker

About

Create tables from cell lists

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages