Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ file() — 386BSD 1.0

Media Vault

Software Library

Restoration Projects

Artifacts Sought

ascii..SH NAME - determine file type [ ] [ namefile ] [ magicfile
] file ...  tests each argument in an  attempt  to  classify  it.
There   are  three  sets  of  tests,  performed  in  this  order:
filesystem tests, magic number tests, and  language  tests.   The
test  that succeeds causes the file type to be printed.  The type
printed will usually contain one of the words (the file  contains
only  ASCII  characters  and is probably safe to read on an ASCII
terminal), (the file contains the result of compiling  a  program
in  a  form  understandable  to  some UNIX kernel or another), or
meaning  anything  else  (data  is  usually  `binary'   or   non-
printable).   Exceptions are well-known file formats (core files,
tar archives) that  are  known  to  contain  binary  data.   When
modifying  the  file  or  the  program  itself,  People depend on
knowing that all the readable files in a directory have the  word
``text''  printed.   Don't do as one computer vendor did - change
``shell commands text''  to  ``shell  script''.   The  filesystem
tests  are based on examining the return from a system call.  The
program checks to see if the file is empty, or if it's some  sort
of  special file.  Any known file types appropriate to the system
you are running on (sockets and symbolic links on  4.2BSD,  named
pipes  (FIFOs)  on  System V) are intuited if they are defined in
the system header file The magic number tests are used  to  check
for  files  with data in particular fixed formats.  The canonical
example of this is a binary executable (compiled  program)  file,
whose  format  is defined in and possibly in the standard include
directory.  These  files  have  a  `magic  number'  stored  in  a
particular  place  near  the beginning of the file that tells the
UNIX operating system that the file is a binary  executable,  and
which  of  several  types thereof.  The concept of `magic number'
has been applied by extension to data files.  Any file with  some
invariant  identifier  at  a small fixed offset into the file can
usually be described in this way.  The information in these files
is read from the magic file If an argument appears to be an file,
attempts to guess its language.   The  language  tests  look  for
particular  strings  (cf names.h) that can appear anywhere in the
first few blocks of a file.  For example, the  keyword  indicates
that  the  file  is  most  likely a troff input file, just as the
keyword indicates a C program.  These  tests  are  less  reliable
than  the  previous  two groups, so they are performed last.  The
language test routines also test for  some  miscellany  (such  as
archives)  and  determine  whether  an  unknown  file  should  be
labelled as `ascii text' or `data'.  Use to specify an  alternate
file  of magic numbers.  The option causes a checking printout of
the parsed form of the magic  file.   This  is  usually  used  in
conjunction  with to debug a new magic file before installing it.
The option specifies that the names of the files to  be  examined
are  to  be  read  (one  per line) from before the argument list.
Either or at least one filename argument must be present; to test
the  standard input, use ``-'' as a filename argument.  - default
list of magic numbers - description  of  magic  file  format.   -
tools  for  examining non-textfiles.  This program is believed to
exceed the System V Interface Definition of FILE(CMD), as near as
one can determine from the vague language contained therein.  Its
behaviour is mostly compatible with the System V program  of  the
same  name.   This  version knows more magic, however, so it will
produce different (albeit more accurate) output  in  many  cases.
The  one significant difference between this version and System V
is that this version treats any white space as  a  delimiter,  so
that  spaces  in  pattern  strings must be escaped.  For example,
>10  string    language   impress      (imPRESS   data)   in   an
existing    magic    file   would   have   to   be   changed   to
>10  string    language\   impress (imPRESS   data)    The    Sun
Microsystems  implementation of System V compatibility includes a
file(1) command that has some  extentions.   My  version  differs
from  Sun's  only  in minor ways.  The significant one is the `&'
operator,  which  Sun's  program   expects   as,   for   example,
>16  long&0x7fffffff     >0        not  stripped would be entered
in my version as >16  long &0x7fffffff    not stripped which is a
little  less general; it simply tests (location 16)&0x7ffffff and
returns its truth value  as  a  C  expression.   The  magic  file
entries  have been collected from various sources, mainly USENET,
and contributed by various authors.  Ian Darwin  (address  below)
will  collect  additional  or  corrected  magic  file entries.  A
consolidation  of  magic  file  entries   will   be   distributed
periodically.   The  order  of  entries  in  the  magic  file  is
significant.  Depending on what system you are using,  the  order
that they are put together may be incorrect.  If your old command
uses a magic file, keep the old magic file around for  comparison
purposes  (rename  it  to  There has been a command in every UNIX
since at least Research Version 6 (man page dated January, 1975).
The System V version introduced one significant major change: the
external list of magic number types.   This  slowed  the  program
down  slightly  but  made  it a lot more flexible.  This program,
based on the System V version, was written by Ian Darwin  without
looking  at anybody else's source code.  John Gilmore revised the
code extensively, making it better than the first version.  Geoff
Collyer  found  several inadequacies and provided some magic file
entries.  The program has undergone  continued  evolution  since.
Copyright  (c)  Ian F. Darwin,  1986 and 1987.  Written by Ian F.
Darwin, UUCP address {utzoo | ihnp4}!darwin!ian, Internet address
ian@sq.com,  postal  address:  P.O.  Box 603, Station F, Toronto,
Ontario, CANADA M4Y 2L8.  and written by and copyright  by  Henry
Spencer,  utzoo!henry.   This  software  is  not  subject  to any
license of the American Telephone and Telegraph Company or of the
Regents  of  the University of California.  Permission is granted
to anyone to use this software for any purpose  on  any  computer
system,  and  to  alter it and redistribute it freely, subject to
the following restrictions: 1. The author is not responsible  for
the  consequences  of  use of this software, no matter how awful,
even if they arise from flaws in  it.   2.  The  origin  of  this
software  must not be misrepresented, either by explicit claim or
by omission.  Since few users ever  read  sources,  credits  must
appear in the documentation.  3. Altered versions must be plainly
marked as such, and must  not  be  misrepresented  as  being  the
original  software.   Since  few users ever read sources, credits
must appear in the documentation.  4.  This  notice  may  not  be
removed  or  altered.   A  few  support  files  (getopt,  strtok)
distributed with this  package  are  by  Henry  Spencer  and  are
subject  to  the same terms as above.  A few simple support files
(strtol, strchr) distributed with this package are in the  public
domain;  they  are so marked.  The files and were written by John
Gilmore from his public-domain program, and are  not  covered  by
the  above  restrictions.   There  must  be a way to automate the
construction of the Magic file from all the glop in magdir.  What
is  it?   uses several algorithms that favor speed over accuracy,
thus it can be misled about the contents  of  ASCII  files.   The
support  for ASCII files (primarily for programming languages) is
simplistic, inefficient and  requires  recompilation  to  update.
Should  there  be  an  ``else''  clause  to  follow  a  series of
continuation lines?  Is it worthwhile to implement recursive file
inspection,  so  that  compressed files, uuencoded, etc., can say
``compressed  ascii  text''  or  ``compressed   executable''   or
``compressed  tar  archive"  or  whatever?   The  magic  file and
keywords should have regular expression  support.   It  might  be
advisable to allow upper-case letters in keywords for e.g., troff
commands vs man page macros.  Regular  expression  support  would
make  this easy.  The program doesn't grok FORTRAN.  It should be
able to figure FORTRAN  by  seeing  some  keywords  which  appear
indented  at the start of line.  Regular expression support would
make this easy.  The list of keywords in probably belongs in  the
Magic  file.   This  could be done by using some keyword like `*'
for the offset value.  The program should malloc the  magic  file
structures,  rather  than using a fixed-size array as at present.
The magic file should be compiled into  binary  (or  better  yet,
fixed-length  ASCII  strings  for  use  in  heterogenous  network
environments) for faster startup.  Then the program would run  as
fast  as  the  Version  7  program  of  the  same  name, with the
flexibility of the System V version.  But then there  would  have
to   be   yet   another  magic  number  for  the  file.   Another
optimisation would be to sort the magic file so that we can  just
run  down  all  the  tests  for the first byte, first word, first
long, etc, once we have fetched it.  Complain about conflicts  in
the  magic file entries.  Make a rule that the magic entries sort
based on file offset rather than position within the magic  file?
The  program  should  provide  a way to give an estimate of ``how
good'' a guess is.  We end up removing guesses (e.g. ``From '' as
first  5  chars  of  file)  because they are not as good as other
guesses (e.g. ``Newsgroups:'' versus "Return-Path:").  Still,  if
the  others don't pan out, it should be possible to use the first
guess.  Perhaps the program should automatically  try  all  tests
with  byte-swapping done, to avoid having to figure out the byte-
swapped values when constructing the magic file.  Of course  this
will  run  more slowly, so it should probably be an option (-a?).
This manual page, and particularly this section, is too long.












































Typewritten Software • bear@typewritten.org • Edmonds, WA 98026