csplit(1) csplit(1)
NAME
csplit - context split
SYNOPSIS
csplit [option ...] file argument ...
DESCRIPTION
csplit splits the contents of a file or the text it reads from stan-
dard input into smaller sections and writes all or some of these sec-
tions to separate output files. The original file is left unaltered.
The way in which csplit divides a file and the sections for which out-
put files are created are specified in the command-line arguments.
If -n is not specified, csplit creates a maximum of 100 output files
per call.
OPTIONS
No option specified:
The output files are named xx00, xx01, and so on.
For each output file that it creates, csplit writes a character
count on standard output.
Any files that have already been created are removed if an error
occurs.
-f name
The output files are called name00, name01, etc.
-f not specified:
The output files are named xx00, xx01, and so on.
-k Files that have already been created are retained if an error
occurs.
-n number
The current number of the output files comprises number digits,
whereby 1 <= number <= 9.
Example: For -n 4, the output files are called xx0000, xx0001 etc.
-n not specified:
The current number consists of 2 digits.
-s The output of a character count is suppressed.
-- If file begins with a dash (-), the end of the command-line
options must be marked with --.
Page 1 Reliant UNIX 5.44 Printed 11/98
csplit(1) csplit(1)
file Name of the input file.
If you use a dash (-) as the name for file, csplit reads from
standard input.
argument
You can specify several arguments, each of which references a
particular line in the input file. These lines represent the
points at which csplit is to split the file into sections. Each
dividing line becomes the first line of a new section. If you
specify n arguments, csplit divides the file into n+1 sections.
These sections contain the following lines:
_________________________________________________________________
| Section | Contents |
|_________|______________________________________________________|
| 0 | All lines from the start of the input file up to but|
| | not including the line referenced by the first argu-|
| | ment. |
|_________|______________________________________________________|
| 1 | All lines from the first dividing line up to but not|
| | including the line referenced by the second argu- |
| | ment. |
|_________|______________________________________________________|
| ... | ... |
|_________|______________________________________________________|
| n | All lines from the line referenced by the nth argu- |
| | ment to the end of the input file. |
|_________|______________________________________________________|
csplit usually writes each section to a separate output file.
This does not apply when the argument
%regularexpression%[+number][-number] is used (see below). The
last section (section n) is always written to an output file.
The arguments you specify are processed by csplit in the order in
which you list them. To begin with, the first line of the input
file is the current line. After an argument has been processed,
the line referenced by this argument becomes the current line.
The line referenced by the next argument must lie in the range
between but not including the current line and the end of the
input file. Thus the line referenced by the second argument must
come after the line referenced by the first argument.
Page 2 Reliant UNIX 5.44 Printed 11/98
csplit(1) csplit(1)
argument can be specified as follows:
/regexp/[+number][-number]
An argument in the form /regexp/ references the next line
after the current line that matches the specified regular
expression. The section from the current line up to but not
including the line that matches the regular expression is
written to an output file. The line matching the regular
expression now becomes the current line.
The +number or -number offset shifts the dividing line
number lines after (+) or before (-) the line that matches
the regular expression. The line that is number lines after
(+) or before (-) the line matching the regular expression
thus becomes the current line.
Simple regular expressions [see expressions(5)] are recog-
nized. If the argument contains blanks or shell metacharac-
ters [see specialchar(5)], you must either escape every such
character with a backslash \ or enclose the whole argument
in single quotes '...'. The regular expression must not con-
tain any newline characters.
%regexp%[+number][-number]
An argument in the form %regexp% references the next line
after the current line that matches the specified regular
expression. The line that matches the regular expression
becomes the current line. csplit in this case does not
create an output file for the relevant section.
If the +number or -number offset is also specified, the
current line will be the line that is number lines after (+)
or before (-) the line containing the regular expression.
Simple regular expressions [see expressions(5)] are recog-
nized. If the argument contains blanks or shell metacharac-
ters [see specialchar(5)], you must either escape every such
character with a backslash \ or enclose the whole argument
in single quotes '...'. The regular expression must not con-
tain any newline characters.
num This argument references the line with line number num.
csplit writes the section from the current line up to but
not including the numth line to an output file. The numth
line then becomes the current line.
{n} This argument is an abbreviation for n arguments of the pre-
vious type (see above) and means: "repeat the preceding
argument n times", where n is an integer greater than 1.
The {n} argument can be entered after any of the above-
mentioned arguments, with a blank to separate them.
Page 3 Reliant UNIX 5.44 Printed 11/98
csplit(1) csplit(1)
Thus if it follows an argument in the form
/regexp/+number][-number] or %regexp%[+number][-number],
this argument will be repeated n times.
Example:
'/regexp/' {2}
is an abbreviation for
'/regexp/' '/regexp/' '/regexp/'
If {n} follows an argument of the num type, the file will be
split n times, from the numth line onward, into sections of
num lines each.
Example:
100 {2}
is an abbreviation for
100 200 300
ERROR MESSAGES
argument - out of range
The line referenced by the specified argument lies outside the permis-
sible range. The legal range is from, but not including, the current
line to the end of the file.
'xx' file limit reached ...
You have specified so many arguments that csplit had to create more
output files than the value specified for the -n option allows.
LOCALE
The LCMESSAGES environment variable governs the language in which
message texts are displayed.
LCCTYPE governs character classes and character conversion (shift-
ing).
If LCMESSAGES or LCCTYPE is undefined or is defined as the null
string, it defaults to the value of LANG. If LANG is likewise unde-
fined or null, the system acts as if it were not internationalized.
If any of the locale variables has an invalid value, the system acts
as if none of the variables were set.
Page 4 Reliant UNIX 5.44 Printed 11/98
csplit(1) csplit(1)
The LCALL environment variable governs the entire locale. LCALL
takes precedence over all the other environment variables which affect
internationalization.
EXAMPLES
Example 1
The file book contains a text that is subdivided into three chapters.
The first chapter is preceded by a preface; an appendix follows the
last chapter. Each chapter begins with the title "CHAPTER ..."; the
title of the appendix is "APPENDIX".
You now wish to put the preface, the individual chapters, and the
appendix into separate files. The output files are to be named chap00,
chap01, etc.
$ csplit -f chap book '/CHAPTER/' '/CHAPTER/' '/CHAPTER/' '/APPENDIX/'
1636
15124
32743
20344
2576
$ ls
book
chap00
chap01
chap02
chap03
chap04
The file chap00 contains the preface and consists of 1636 characters.
The appendix is located in the file chap04.
The same results could also have been obtained by abbreviating the
csplit call as follows:
$ csplit -f chap book '/CHAPTER/' {2} '/APPENDIX/'
.
.
You can now edit the sections separately, and later you can join them
again using cat:
$ cat chap0[0-4] > book
Example 2
The input file file is to be split into sections every hundred lines.
To do this, you enter:
$ csplit file 100 {98}
Page 5 Reliant UNIX 5.44 Printed 11/98
csplit(1) csplit(1)
The argument {98} stands for 98 arguments: 200 300 ... 9900.
If file contains 9900 or more lines, csplit creates 100 output files.
The first output file xx00 includes line 1 to 99 (inclusive); the last
output file, xx99, contains the rest of file from line 9900 onward.
If file contains fewer than 9900 lines, csplit issues the error mes-
sage {98} - out of range and terminates. If you include option -k in
the call, the files already created are retained.
$ csplit -k file 100 {98}
If file contains only 9830 lines, for example, then xx98 is the last
output file created and includes lines 9800 to 9830.
Example 3
The file prog.c contains a C source program. The program includes a
main function and a maximum of 20 further functions. In accordance
with C conventions, each function ends with a right brace at the
beginning of a line (in column 1). Right braces within a function are
not located in the first column of a line.
Each function is now to be written to a separate file. To do this, you
enter:
$ csplit -k prog.c '%main(%' '/^}/+1' {19}
If the program contains exactly 20 functions in addition to the main
function, csplit splits the file into 22 sections.
Section 0 contains all lines from the beginning of the file up to but
not including the start of the main function. This section will not be
written to an output file (argument %main(%).
Section 1 contains the main function and is written to the output file
xx00 (argument /^}/+1).
Functions 1 to 19 are similarly written to separate output files in
succession (argument {19}). The final section, i.e. section 22, con-
tains the rest of the input file (which in this case is function 20)
and is written to the output file xx20.
If the program contains fewer than 20 functions, csplit will terminate
at the last function and issue the error message {19} - out of range.
Since the -k option has been set, the created files will, however, be
retained.
SEE ALSO
ed(1), sed(1), sh(1), split(1).
Page 6 Reliant UNIX 5.44 Printed 11/98