LinuxDevCenter.com
oreilly.comSafari Books Online.Conferences.

advertisement


Building Unix Tools with Ruby

by Jacek Artymiak
09/18/2003

This article demonstrates how to write Ruby scripts that work like typical, well-behaved Unix commands. To make it more fun and useful, we'll write a command-line tool for processing data stored in the comma separated values (CSV) file format. CSV (not CVS) is used to exchange data between databases, spreadsheets, and securities analysis software, as well as between some scientific applications. That format is also used by payment processing sites that provide downloadable sales data to vendors who use their services.

CSV files are plain text ASCII files in which one line of text represents one row or data and columns are separated with commas. A sample CSV file is shown below.

ticker,per,date,open,high,low,close,vol
XXXX,D,3-May-02,83.01,83.58,71.13,78.04,9645300
XXXX,D,2-May-02,82.47,85.76,82.05,83.84,7210000
XXXX,D,1-May-02,86.80,90.83,81.74,85.50,14253300

What Is the Script Supposed to Do?

The script, csvt, will extract selected columns of data from a CSV file. The output will also be a CSV file, and the user will be able to specify the order the columns of data will be printed in. A simple data integrity test will make csvt fail, when the number of columns in one line differs from the number of columns in the previous line. The source of data will be either a file or standard input (STDIN), as is customary for many Unix command line tools.

The utility will support the following options:

  • --extract col[,col][...], to print selected columns from input. Numbers are separated with commas, and numbering starts with 0. For example,

    $ csvt --extract 1,5,2 file

    prints columns 1, 5 and 2 (in that order) from file:

    per,low,date
    D,71.13,3-May-02
    D,82.05,2-May-02
    D,81.74,1-May-02

    It will possible to list the same column more than once

    $ csvt --extract 0,1,5,2,0 file

    Which has the following output as a result:

    ticker,per,low,date,ticker
    XXXX,D,71.13,3-May-02,XXXX
    XXXX,D,82.05,2-May-02,XXXX
    XXXX,D,81.74,1-May-02,XXXX
  • --remove col[,col][...], to print everything but the selected columns. Numbers are separated with commas, and numbering starts with 0. For example,

    $ csvt --remove 1,5,2 file

    will print all columns except 1, 5 and 2 (in any order) from file:

    ticker,open,high,close,vol
    XXXX,83.01,83.58,78.04,9645300
    XXXX,82.47,85.76,83.84,7210000
    XXXX,86.80,90.83,85.50,14253300

    Listing the same column number more than once will have no effect.

  • --help, -h, to display a short help page.
  • --usage, -u have the same effect as --help.
  • --version, to display csvt version information.

When csvt finds an unsupported option, or when it is run without any options, it will default to the behavior determined by --help.

Before You Begin

To complete this tutorial you will need an OS capable of running the Ruby interpreter, the Ruby interpreter itself, and a text editor. The operating system can be any POSIX-compatible system, either commercial (AIX, Solaris, QNX, Microsoft NT/2000, Mac OS X, and others) or free (Linux, FreeBSD, NetBSD, OpenBSD, or Darwin). The Ruby interpreter should be the latest release of Ruby. You can check if Ruby has been installed on your system with the following command:

$ ruby --version

Related Reading

Ruby in a Nutshell
By Yukihiro Matsumoto

When the system reports that there is no such file or directory, you can either download the latest Ruby binaries from the Ruby site or from one of repositories of ports and packages for your operating system (check the list of resources at the end of this article).

If ready-made binaries are not available, you can always build Ruby from original sources found at the Ruby site. Detailed instructions for building Ruby can be found in the README file found in the interpreter's source archive. If you get stuck support is available on comp.lang.ruby as well as on the Ruby-talk mailing list. (Subscription details are on the Ruby site).

The choice of text editor is largely a matter of personal preference. The author is a devoted vi user, but any text editor will do.

Start with the Help Screen

Every tool, no matter how small, should come with a manual or, at the very least, it should print a short help screen that explains its usage. It is a good habit to write documentation before writing the first line of code.

Since csvt is a simple tool with only five options, you can be forgiven for not writing the manual, but you should embed basic documentation in the script itself. This should be mandatory for even a short script that you are writing for your own use, because chances are good that you will forget what it does in two weeks.

The help screen shown above will be printed by csvt after the user makes a mistake or runs csvt without specifying any options. Since it can only occupy one standard text terminal screen (80 by 25 characters), it must be terse, but informative. Ideally, it should present the following information:

  • the name and the purpose of your utility;
  • basic usage information;
  • POSIX and GNU options recognized by csvt;
  • some examples;
  • where to send bug reports.

Your help screen could look like this (and it's okay just to type this stuff in a text editor and wrap it in code later):

csvt -- extract columns of data from a CSV (Comma-Separate Values) file
Usage: csvt [POSIX or GNU style options] file ...

POSIX options                     GNU long options
    -e col[,col][,col]...             --extract col[,col][,col]...
    -r col[,col][,col]...             --remove col[,col][,col]...
    -h                                --help
    -u                                --usage
    -v                                --version

Examples:
csvt -e 1,5,6 file             print column 1, 5 and 6 from file
csvt --extract 4,1 file        print column 4 and 1 from file
csvt -r 2,7,1 file             print all columns except 2, 7 and 1 from file
csvt --remove 6,0 file         print all columns except 6 and 0 from file
cat file | csvt --remove 6,0   print all columns except 6 and 0 from file

Send bug reports to bugs@foo.bar
For licensing terms, see source code

Because there are several cases where it might be necessary to display the help screen, you will need to put the code that displays it in a separate method. We'll call it printusage(). (It helps to have the source code of csvt handy)

def printusage(error_code)
    print "csvt -- extract columns of data from a CSV (Comma-Separate Values) file\n"
    print "Usage: csvt [POSIX or GNU style options] file ...\n\n"
    print "POSIX options                     GNU long options\n"
    print "    -e col[,col][,col]...             --extract col[,col][,col]...\n"
    print "    -r col[,col][,col]...             --remove col[,col][,col]...\n"
    print "    -h                                --help\n"
    print "    -u                                --usage\n"
    print "    -v                                --version\n\n"

    print "Examples: \n"
    print "csvt -e 1,5,6 file             print column 1, 5 and 6 from file\n"
    print "csvt --extract 4,1 file        print column 4 and 1 from file\n"
    print "csvt -r 2,7,1 file             print all columns except 2, 7 and 1 from file\n"
    print "csvt --remove 6,0 file         print all columns except 6 and 0 from file\n"
    print "cat file | csvt --remove 6,0   print all columns except 6 and 0 from file\n\n"
    print "Send bug reports to bugs@foo.bar\n"
    print "For licensing terms, see source code\n"

    exit(error_code)
end

printusage() takes one argument, error_code, which is later passed to exit()—a built-in Ruby method used to stop the script and return an error code. In your script printusage() will be called in two cases:

  • when the user runs csvt with --help or --usage options, so the script should return 0 (no errors), or
  • when the user runs csvt with an unsupported option or without options, and the script should return 1 (to indicate an error).

You should always remember to write code that returns appropriate error codes. When your script returns meaningful error codes, it is much easier to write scripts that can handle critical situations.

Pages: 1, 2, 3

Next Pagearrow




Linux Online Certification

Linux/Unix System Administration Certificate Series
Linux/Unix System Administration Certificate Series — This course series targets both beginning and intermediate Linux/Unix users who want to acquire advanced system administration skills, and to back those skills up with a Certificate from the University of Illinois Office of Continuing Education.

Enroll today!


Linux Resources
  • Linux Online
  • The Linux FAQ
  • linux.java.net
  • Linux Kernel Archives
  • Kernel Traffic
  • DistroWatch.com


  • Sponsored by: