Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ csplit(1) — Reliant UNIX 5.44c4

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

ed(1)

sed(1)

sh(1)

split(1)

csplit(1)                                                         csplit(1)

NAME
     csplit - context split

SYNOPSIS
     csplit [option ...] file argument ...

DESCRIPTION
     csplit splits the contents of a file or the text it reads from stan-
     dard input into smaller sections and writes all or some of these sec-
     tions to separate output files. The original file is left unaltered.

     The way in which csplit divides a file and the sections for which out-
     put files are created are specified in the command-line arguments.

     If -n is not specified, csplit creates a maximum of 100 output files
     per call.

OPTIONS
     No option specified:
          The output files are named xx00, xx01, and so on.

          For each output file that it creates, csplit writes a character
          count on standard output.

          Any files that have already been created are removed if an error
          occurs.

     -f name
          The output files are called name00, name01, etc.

          -f not specified:

          The output files are named xx00, xx01, and so on.

     -k   Files that have already been created are retained if an error
          occurs.

     -n number
          The current number of the output files comprises number digits,
          whereby 1 <= number <= 9.

          Example: For -n 4, the output files are called xx0000, xx0001 etc.

          -n not specified:

          The current number consists of 2 digits.

     -s   The output of a character count is suppressed.

     --   If file begins with a dash (-), the end of the command-line
          options must be marked with --.




Page 1                       Reliant UNIX 5.44                Printed 11/98

csplit(1)                                                         csplit(1)

     file Name of the input file.

          If you use a dash (-) as the name for file, csplit reads from
          standard input.

     argument
          You can specify several arguments, each of which references a
          particular line in the input file. These lines represent the
          points at which csplit is to split the file into sections. Each
          dividing line becomes the first line of a new section. If you
          specify n arguments, csplit divides the file into n+1 sections.
          These sections contain the following lines:

          _________________________________________________________________
         | Section |  Contents                                            |
         |_________|______________________________________________________|
         | 0       |  All lines from the start of the input file up to but|
         |         |  not including the line referenced by the first argu-|
         |         |  ment.                                               |
         |_________|______________________________________________________|
         | 1       |  All lines from the first dividing line up to but not|
         |         |  including the line referenced by the second argu-   |
         |         |  ment.                                               |
         |_________|______________________________________________________|
         | ...     |  ...                                                 |
         |_________|______________________________________________________|
         | n       |  All lines from the line referenced by the nth argu- |
         |         |  ment to the end of the input file.                  |
         |_________|______________________________________________________|

          csplit usually writes each section to a separate output file.

          This does not apply when the argument
          %regularexpression%[+number][-number] is used (see below). The
          last section (section n) is always written to an output file.

          The arguments you specify are processed by csplit in the order in
          which you list them. To begin with, the first line of the input
          file is the current line. After an argument has been processed,
          the line referenced by this argument becomes the current line.
          The line referenced by the next argument must lie in the range
          between but not including the current line and the end of the
          input file. Thus the line referenced by the second argument must
          come after the line referenced by the first argument.










Page 2                       Reliant UNIX 5.44                Printed 11/98

csplit(1)                                                         csplit(1)

          argument can be specified as follows:

          /regexp/[+number][-number]
               An argument in the form /regexp/ references the next line
               after the current line that matches the specified regular
               expression. The section from the current line up to but not
               including the line that matches the regular expression is
               written to an output file. The line matching the regular
               expression now becomes the current line.

               The +number or -number offset shifts the dividing line
               number lines after (+) or before (-) the line that matches
               the regular expression. The line that is number lines after
               (+) or before (-) the line matching the regular expression
               thus becomes the current line.

               Simple regular expressions [see expressions(5)] are recog-
               nized. If the argument contains blanks or shell metacharac-
               ters [see specialchar(5)], you must either escape every such
               character with a backslash \ or enclose the whole argument
               in single quotes '...'. The regular expression must not con-
               tain any newline characters.

          %regexp%[+number][-number]
               An argument in the form %regexp% references the next line
               after the current line that matches the specified regular
               expression. The line that matches the regular expression
               becomes the current line. csplit in this case does not
               create an output file for the relevant section.

               If the +number or -number offset is also specified, the
               current line will be the line that is number lines after (+)
               or before (-) the line containing the regular expression.

               Simple regular expressions [see expressions(5)] are recog-
               nized. If the argument contains blanks or shell metacharac-
               ters [see specialchar(5)], you must either escape every such
               character with a backslash \ or enclose the whole argument
               in single quotes '...'. The regular expression must not con-
               tain any newline characters.

          num  This argument references the line with line number num.
               csplit writes the section from the current line up to but
               not including the numth line to an output file. The numth
               line then becomes the current line.

          {n}  This argument is an abbreviation for n arguments of the pre-
               vious type (see above) and means: "repeat the preceding
               argument n times", where n is an integer greater than 1.

               The {n} argument can be entered after any of the above-
               mentioned arguments, with a blank to separate them.


Page 3                       Reliant UNIX 5.44                Printed 11/98

csplit(1)                                                         csplit(1)

               Thus if it follows an argument in the form
               /regexp/+number][-number] or %regexp%[+number][-number],
               this argument will be repeated n times.

               Example:

               '/regexp/' {2}

               is an abbreviation for

               '/regexp/' '/regexp/' '/regexp/'

               If {n} follows an argument of the num type, the file will be
               split n times, from the numth line onward, into sections of
               num lines each.

               Example:

               100 {2}

               is an abbreviation for

               100 200 300

ERROR MESSAGES
     argument - out of range

     The line referenced by the specified argument lies outside the permis-
     sible range. The legal range is from, but not including, the current
     line to the end of the file.

     'xx' file limit reached ...

     You have specified so many arguments that csplit had to create more
     output files than the value specified for the -n option allows.

LOCALE
     The LCMESSAGES environment variable governs the language in which
     message texts are displayed.

     LCCTYPE governs character classes and character conversion (shift-
     ing).

     If LCMESSAGES or LCCTYPE is undefined or is defined as the null
     string, it defaults to the value of LANG. If LANG is likewise unde-
     fined or null, the system acts as if it were not internationalized.

     If any of the locale variables has an invalid value, the system acts
     as if none of the variables were set.





Page 4                       Reliant UNIX 5.44                Printed 11/98

csplit(1)                                                         csplit(1)

     The LCALL environment variable governs the entire locale. LCALL
     takes precedence over all the other environment variables which affect
     internationalization.

EXAMPLES
     Example 1

     The file book contains a text that is subdivided into three chapters.
     The first chapter is preceded by a preface; an appendix follows the
     last chapter. Each chapter begins with the title "CHAPTER ..."; the
     title of the appendix is "APPENDIX".

     You now wish to put the preface, the individual chapters, and the
     appendix into separate files. The output files are to be named chap00,
     chap01, etc.

     $ csplit -f chap book '/CHAPTER/' '/CHAPTER/' '/CHAPTER/' '/APPENDIX/'
     1636
     15124
     32743
     20344
     2576

     $ ls
     book
     chap00
     chap01
     chap02
     chap03
     chap04

     The file chap00 contains the preface and consists of 1636 characters.
     The appendix is located in the file chap04.

     The same results could also have been obtained by abbreviating the
     csplit call as follows:

     $ csplit -f chap book '/CHAPTER/' {2} '/APPENDIX/'
     .
     .

     You can now edit the sections separately, and later you can join them
     again using cat:

     $ cat chap0[0-4] > book

     Example 2

     The input file file is to be split into sections every hundred lines.
     To do this, you enter:

     $ csplit file 100 {98}


Page 5                       Reliant UNIX 5.44                Printed 11/98

csplit(1)                                                         csplit(1)

     The argument {98} stands for 98 arguments: 200 300 ... 9900.

     If file contains 9900 or more lines, csplit creates 100 output files.
     The first output file xx00 includes line 1 to 99 (inclusive); the last
     output file, xx99, contains the rest of file from line 9900 onward.

     If file contains fewer than 9900 lines, csplit issues the error mes-
     sage {98} - out of range and terminates. If you include option -k in
     the call, the files already created are retained.

     $ csplit -k file 100 {98}

     If file contains only 9830 lines, for example, then xx98 is the last
     output file created and includes lines 9800 to 9830.

     Example 3

     The file prog.c contains a C source program. The program includes a
     main function and a maximum of 20 further functions. In accordance
     with C conventions, each function ends with a right brace at the
     beginning of a line (in column 1). Right braces within a function are
     not located in the first column of a line.

     Each function is now to be written to a separate file. To do this, you
     enter:

     $ csplit -k prog.c '%main(%' '/^}/+1' {19}

     If the program contains exactly 20 functions in addition to the main
     function, csplit splits the file into 22 sections.

     Section 0 contains all lines from the beginning of the file up to but
     not including the start of the main function. This section will not be
     written to an output file (argument %main(%).

     Section 1 contains the main function and is written to the output file
     xx00 (argument /^}/+1).

     Functions 1 to 19 are similarly written to separate output files in
     succession (argument {19}). The final section, i.e. section 22, con-
     tains the rest of the input file (which in this case is function 20)
     and is written to the output file xx20.

     If the program contains fewer than 20 functions, csplit will terminate
     at the last function and issue the error message {19} - out of range.
     Since the -k option has been set, the created files will, however, be
     retained.

SEE ALSO
     ed(1), sed(1), sh(1), split(1).




Page 6                       Reliant UNIX 5.44                Printed 11/98

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026