perltab -- Manipulate data in tabular form from the command line in a perl one-liner style.

Getting started

Install modules listed in section Dependencies
% perltab -h

Running regression tests

% perl t/perltabCommandsOutputMatchExpected.t

Summary

perltab could be thought of as an extension to perl autosplit mode for handling tabular data. I developed perltab while working with a bioinformatics dataset that had quite a few features (columns), many of them had missing values, which made the data inconvenient to directly handle with something like perl autosplit mode. With perl autosplit mode it is easy to output the nth column.

% perl -F'\t' -anE 'say $F[1]' heightWeight.tsv

perltab makes this slightly easier:
% perltab -e 'say $F[1]' heightWeight.tsv

But it also allows for using named columns (and allows for abbreviation).
% perltab -e 'say F(hei)' heightWeight.tsv

This is convenient, but when numerical computation and missing values come to play perltab is particularly helpful.

For example the minimum value of a column can be output in this way:
% perltab -d 'bemin $m, F(hei)' -z 'say $m'
where the script following the -d option is run on all input lines EXCEPT the initial header line holding the colum names, and the -z option script is run once at the end.

or, as long as the column labels do not look like numbers, this will also work:
% perltab -e 'bemin $m, F(hei)' -z 'say $m'

To do that on the command line without perltab is quite difficult without a LOT of typing, mostly because non-numerical values must be silently skipped but also because $m needs to be initialized properly (for example zero won't work if negative values are present in the data). perltab defines several reasonably mnenmonic functions to handle issues like that transparently.

The perltab documention % perltab -h has close to 50 examples of using perltab and all of these are represented in the regression test suite.

Caveats.

Runs on a linux box under Perl v5.18.2, v5.28.1. Untested elsewhere.

Examples

Output columns 'height' and 'weight'.
% perltab -e 'say F(qw(height weight))' heightWeight.tsv
Or more succinctly by letting perltab add the qw for you (see -h qw).
% perltab -e 'say F(height weight)' heightWeight.tsv
Or even more succinctly,
% perltab -e 'say F(hei wei)' heightWeight.tsv
Any unique prefix of the desired column labels will work (see -h label). Here we assume no other headers starts with 'h' or 'w'.
% perltab -e 'say F(h w)'
Swap first and final field.
% perltab -pe 'swap 0, -1' input.tsv
Swapping columns can also be done (more efficiently) by directly accessing @F.
% perltab -e 'say @F[-1, 1..$#F-1, 0]' input.tsv
Swap fields labeled 'height' and 'weight'.
% perltab -pe 'swap qw(height weight)' heightWeight.tsv
Swapping columns by label can also be done (more efficiently) by using N() to access @F directly.
% perltab -pe '@I= N(height weight); @F[@I]= @F[reverse @I]' heightWeight.tsv
Move column 'ID' to directly after column 'weight'.
% perltab -pe 'movaft qw(weight ID)' heightWeight.tsv
Manually move column 'ID' to directly after column 'weight', assuming column ID comes before column weight in the input.
% perltab -e '($x,$y)= N(ID weight); say @F[0..$x-1,$x+1..$y,$x,$y+1..$#F]' heightWeight.tsv
Delete columns starting in 'c', 'd' or 'e'.
% perltab -pe 'del grep /^[cde]/, @H' animals.tsv
Echo input file, skipping data lines with any missing (empty string) values.
% perltab -e 'speak @F' heightWeight.tsv
Output BMI (Body Mass Index) from rows in which both height and weight have values.
% perltab -e '@V= nextNum F(height weight); say 10000*$V[1]/$V[0]**2' heightWeight.tsv
Add BMI (Body Mass Index) column to table in heightWeight.tsv, skipping missing (empty string) values.
% perltab -pe 'insAft "weight", "BMI"=>doNum{10000*$_[1]/$_**2} F(height weight)' heightWeight.tsv
Reformat numbers in field 'BMI'.
% perltab -pe 'reform "%.2f", F(BMI)' heightWeightBMI.tsv
Extract data fields 1,2 for rows in which both have a numerical value.
% perltab -d 'donum {say @_} @F[1,2]' heightWeight.tsv
Display column labels with their numbers.
% perltab -H 'say $_,$H[$_] for 0..$#H' input.tsv
Remove column named 'height'.
% perltab -pe 'del "height"' heightWeight.tsv
or,
% perltab -e 'say F( grep {!/^height$/} @H )' heightWeight.tsv
Select rows from heightWeight.tsv matching ids listed in file 'ids.txt' (holding one id per line).
% perltab -b 'SNset %S, "ids.txt"' -ge '$S{ F(ID)}' heightWeight.tsv > selected.tsv
After selecting those rows, sort them so that the ids are in the same order as in "/tmp/ids".
% perltab -b 'SNset %S, "ids.txt"' -s '$S{ F(ID) }' selected.tsv
Print minimum value of column 'banana', quietly ignoring non-numerical values.
% perltab -e 'bemin $m, F(banana)' -z 'say $m'
Linearly normalize numbers in column 'banana' to be in range [0,1].
% perltab -p -e 'beminmax $min, $max, F(banana)' -e2 'donum sub{ $_= ($_-$min) / ($max-$min)}, F(banana)' fruit.tsv
Print column labels containing 'eight'.
% perltab -H 'say grep /eight/, @H'
Remove any data columns with non-numerical entries.
% perltab -gd 'allnum @F'
Average all columns in a file (Non-numeric data treated as 0).
% perltab -d '$s[$_]+= num0 $F[$_] for 0..$#F' -z 'say map {$_/(NR-1)} @s'
Sort heightWeight.tsv so that IDs found in file age.tsv come first, in the same order as they appear in age.tsv.
% perltab -e '$R{ F(ID)}= NR' -s201U '$R{ F(ID)}' age.tsv -in2 heightWeight.tsv
Same as above but with the block of common ids at the end.
% perltab -e '$R{ F(ID)}= NR' -s2U01 '$R{ F(ID)}' age.tsv -in2 heightWeight.tsv
Same as above but sorted in place, i.e. sort the common IDs to mirror their order in age.tsv, but do not move them as a block to the top or bottom.
% perltab -e '$R{ F(ID)}= NR' -s201 '$R{ F(ID)}' age.tsv -in2 heightWeight.tsv
or one can use the default sort spec 'AB01NM'
% perltab -e '$R{ F(ID)}= NR' -s2 '$R{ F(ID)}' age.tsv -in2 heightWeight.tsv
Add a Japanese Heisei fiscal year column 'FY', with entries like "H27" to a file with YYYYMMDD format column 'Date'.
% perltab -pe 'insaft qw(Date FY), sub{ ($y,$m)= unpack "a4a2", F(Date); "H". (($m>3)+ $y-1989)} YYYYMMDD.tsv
Transpose rows and columns. 'chop' removes trailing tab characters.
% perltab -e '$T[$_] .= "$F[$_]\t" for 0..$#F' -z 'chop, say for @T' input.tsv
Prepend row number column to each row.
% perltab -e 'say NR,@F' input.tsv
Lexically sort rows by label, except for the first. Comment lines are also output by -p.
% perltab -pe '@F= F($H[0], sort @H[1..$#H])' fruit.tsv
Similar to the above, but skipping comment lines.
% perltab -e 'say F($H[0], sort @H[1..$#H])' fruit.tsv
Output the rows with the maximum height value among rows in two files.
% perltab -d 'bemax $m, F(hei)' -gd2 'donum {$m==$_} F(hei)' heightWeight.tsv more_heightWeight.tsv -in2 heightWeight.tsv more_heightWeight.tsv
or if there are no missing values, more simply:
% perltab -d 'bemax $m, F(hei)' -gd2 '$m==F(hei)' heightWeight.tsv more_heightWeight.tsv -in2 heightWeight.tsv more_heightWeight.tsv

Dependencies

perltab requires the following modules; all available from CPAN.

Data::Lock
indirect
List::MoreUtils
Scalar::Util::Numeric
Statistics::RankCorrelation

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
lib/App		lib/App
t		t
LICENSE		LICENSE
README.jp.md		README.jp.md
README.md		README.md
perltab		perltab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

perltab -- Manipulate data in tabular form from the command line in a perl one-liner style.

Getting started

Running regression tests

Summary

Caveats.

Examples

Dependencies

See Also

Author

About

Uh oh!

Releases

Packages

Languages

License

paulhorton/perltab

Folders and files

Latest commit

History

Repository files navigation

perltab -- Manipulate data in tabular form from the command line in a perl one-liner style.

Getting started

Running regression tests

Summary

Caveats.

Examples

Dependencies

See Also

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages