SAVAS (c-shell) script: Save SAS datasets as Stata datasets/Save Stata datasets as SAS datasets
Based on the file extensions savas...
makes SAS (version 6.09 or later) datafile copies of Stata (any version up to and including the version of Stata being run) datafiles.
or:
makes Stata (version 7 or later) datafile copies of SAS (version 6.09 or later) datafiles.
If you want to use savas on UNIX/Linux machines, then feel free to download
load it here:
Download savas.csh (c-shell script).
If you are not prompted to "Save To Disk", then right-click the link and choose "Save Link Target As..."
Otherwise, you will need to save the web page as a plain text file (not as htm/html). You may want to subsequently rename
the file from savas.csh to savas.
You will also need the SAS macros SAVASTATA and CHAR2FMT:
Download savastata.sas and char2fmt.sas.
If you are not prompted to "Save To Disk", then right-click the link and choose "Save Link Target As...".
Otherwise, you will need to save the web pages as plain text files (not as htm/html). The best way to download
SAVASTATA and CHAR2FMT is to use Stata's command
ssc install
to get the Stata program usesas which also uses the SAVASTATA and
CHAR2FMT SAS macros. The SAS macro file savastata.sas will
be in the same directory as other ado-files that start with the letter "s" and the char2fmt.sas file will be in
the same directory as ones that start with the letter "c":
ssc install usesas , replace
You will also need the Stata program savasas. Use Stata's command
ssc install
to get all the files required for
savasas:
ssc install savasas , replace
Here is the savas man page.
Click here to download the savas man page. If you are not prompted to "Save To Disk",
then right-click the link and choose "Save Target Link As..." Otherwise, you will need to save the web page
as a plain text file (not as htm/html).
Disclaimer: There is no warranty on this software either expressed or implied. This program is released
under the terms and conditions of GNU General Public License.
About savas
Programmer: Dan Blanchette ()
Center for Entrepreneurship and Innovation
Duke University's Fuqua School of Business
Durham, NC USA
Developed at The Carolina Population Center at The University of North Carolina at Chapel Hill
Date: 02Dec2003
Last updated: 25Mar2008
Make a Stata datafile from a SAS datafile or a SAS datafile from a Stata datafile
savas [-options] DataSetName.ext ...
Examples
(The dollar sign indicates a UNIX/Linux prompt. Do not type the dollar sign to use savas.)
$ savas mystata.dta
$ savas mysas.sas7bdat
$ savas -fmts mystata.dta
$ savas -r mystata.dta
$ savas -r ../group/mysas.sas7bdat
$ savas -c ../group/mystata.dta
$ savas analysis.dta analysis2.sas7bdat child_data.dta
$ savas -fmts mysas.sas7bdat
$ savas -x analysis.xpt analysis2.exp child_data.Apr02.stx
Description
savas uses both SAS and Stata installed on the same Linux/Unix machine to make copies of one or more SAS/Stata
datasets as Stata/SAS files. The output dataset will have the same name, but with the appropriate filename extension:
SAS Version 9/8: .sas7bdat
SAS Version 6 UNIX: .ssd01
SAS Version 6 Linux: .ssd02
SAS 6 Transport/Xport:
.xpt, .xport, .exp, .export, .sasx, .stx, .v5x, .v6x, .trans, or
.expt file extensions plus whatever file extension the file might have.
SAS Transport files created by PROC CPORT:
.cport and
.ssp file extensions plus whatever file
extension the file might have.
SPSS portable files: .por file extension.
Stata: .dta
savas can convert SPSS portable files to Stata thanks to SAS's SPSS read-only engine.
savas recognizes these files by the
" .por"
file extension. NOTE: Starting in SPSS 11, SPSS will open and save SAS
sas7bdat files.
By default the Stata/SAS file is created in the same directory as the SAS/Stata datafile, but with the appropriate filename
extension and contains all observations and every variable in the SAS/Stata datafile. savas requires
the use of both Stata and SAS on the same machine.
savas cannot process files that have filenames or are in directories that contain single or double quotes.
The procedure is as follows:
- savas creates a Stata/SAS program that loads the Stata/SAS dataset into Stata/SAS and calls the
savas Stata/SAS program.
- savas uses either Stata's command fdasave
to save the dataset in memory temporarily as a SAS xport datafile or has SAS write the data to an ASCII text file.
- savas writes a Stata/SAS input program to load the dataset into Stata/SAS and to assign variable names,
labels (and formats).
- savas runs the program in Stata/SAS in batch mode to load the data.
- Stata/SAS saves the data as whatever version Stata/SAS file type specified.
NOTE: If saving to old versions of SAS or Stata that have variable name restrictions less than the version of the dataset
being processed, savas checks for variable names that are too long for the output dataset; and, if
the " -rename" option is issued, savas renames them to the first 8 characters or up to 7
plus a number. In addition, it will display this list of renamed variables.
If the SAS/Stata dataset is sorted by one or more variables, the Stata/SAS dataset will also be sorted by those same
variables. The maximum length for a string variable to be passed on to SAS is 200 characters. In such cases, the
first 200 characters will be taken and passed on to SAS (this is a limitation of the SAS xport dataset used to transfer data
from Stata to SAS). If saving a SAS dataset as a Stata dataset, long character variables will be truncated to the
maximum length that Stata will allow. This maximum may be 80 or 244 depending on what version of Stata is being
used. Stata's help page on limits will let you know
which applies. savas will report which, if any, variables were truncated and to what length they
were truncated. Stata variables labels can be up to 80 characters in length.
Options
|
Option
|
Explanation
|
|
-c/-curdir
|
savas saves the Stata/SAS dataset to the current working directory, even though the Stata/SAS
dataset may be located elsewhere.
|
|
-rename
|
specifies that any required renaming of variable names is to be done. The -rename option is
only necessary when saving to a older version of SAS or Stata or when variable names are not unique in SAS.
When saving to an older version rename attempts to rename long variable names (more than 8
characters) to be unique by shortening all long variable names to the first 8 characters or up to the 7 plus a
number. savas lists all variables that were renamed. If more than one dataset is
submitted to savas, then this option will only work for the first dataset. Check out the
-force option.
|
|
-r/-replace
|
By default, savas warns the user if the output dataset already exists, and asks permission to
overwrite it. Option -replace suppresses this interactive behavior and replaces any existing
output dataset without warning. If more than one dataset is submitted to savas, then this
option will only work for the first dataset. Check out the -force option.
|
|
-force
|
is equivalent to using both -rename and -replace and will maintain these options
if more than one dataset is submitted to savas.
|
|
-check
|
creates two check files for the user to compare the input dataset with the output dataset to make sure
savas created the files correctly. This is a comparison that should be done after any datafile
is converted to any other type of datafile by any software. The files are created in the same directory as the
output datafile and are named starting with the name of the datafile followed by either
"_SAScheck.lst" (SAS) or
"_STATAcheck.log" (Stata),
e.g. "mydata_SAScheck.lst"
and "mydata_STATAcheck.log".
|
|
-fmts/-formats
|
specifies to either save value labels that exist in the Stata dataset as SAS formats in a file that will have the
same name as the datafile but with the ".sas7bcat" file extension or to use
such a file if creating a Stata dataset. This formats catalog file will be created or needs to be in the same
directory as the SAS datafile. By default value labels are not saved or created.
NOTE: SAS formats have to be 8 characters or less and cannot end in a number.
savas makes some attempt to rename invalid SAS formats, but it would be best for you to rename or
drop them in Stata before using savas. Stata does not allow string variables to have
user-defined formats numbers with decimal values.
|
|
-sas6
|
indicates to save the Stata file as a SAS version 6 file. SAS 9 will read/open SAS 6 files but will not save
to a version 6 SAS dataset.
|
|
-sasx
|
indicates to save the Stata file as a SAS version 6 transport/xport file using the xport engine.
|
|
-o/-old
|
indicates to save the Stata file as previous version of Stata to the current version, e.g., version 8.
|
|
-i/-intercooled
|
indicates to save the Stata file as Intercooled. This is only necessary if Stata SE or Stata MP is being used.
|
|
-char2lab
|
indicates to use the SAS macro CHAR2FMT to convert long character variables to numeric
with Stata value labels. This is like Stata's
encode command. This option is only helpful when
saving to a Stata 9 or higher dataset since Stata 9 added the feature of allowing value labels to be up to 32,000
characters long.
|
|
-q/-quotes
|
indicates to replace double quotes ( " ) occurring in character variables with single quotes ( ' ) and replace
compound quotes ( `" or "' ) occurring in variable labels or formats with single quotes ( ' ).
savas cannot process character variables with double quotes or variable labels or formats with
compound quotes when converting a dataset from SAS to Stata.
|
|
-x/-xport
|
savas converts SAS transport files into Stata datafiles. NOTE:
Multiple transport datafiles can be processed at a time but all datafiles need to be SAS transport files.
There can be no intermixing of regular SAS/Stata datafiles and transport files when using this option.
|
|
-f/-float
|
prevents the use of Stata's variable type `double'. All variables whose SAS precision would require Stata's
double type are created as float. This option may lead to a loss of precision,
but saves space: a float is stored in 4 bytes, a double in 8 bytes.
|
|
-rights
|
sets the file permission of the new SAS file to be whatever default file permissions would be for a new file in
that directory. The default permissions are the same as the Stata datafile.
|
|
-b/-beep
|
beeps upon completion.
|
|
-s/-silent
|
be silent; in this case, savas does not print any output to the screen, except for error
messages. By default, savas tells what stage of the conversion process is currently being
executed, and it reports number of variables, number of observations, and more.
|
|
-ascii/-sascode
|
specifies that only a datafile and an input program are to be created. By default, savas
executes all four steps outlined above. The -ascii/-sascode option aborts this process after step
3. The user then needs to read in the data manually using Stata/SAS. savas writes
a SAS program (mydata_infile.sas) to read in the SAS datafile
(mydata.xpt) or savas writes a Stata do-file
(_mydata_infile.do) to read in the ASCII datafile
(_mydata_.raw).
|
|
-m/-messy
|
savas specifies that all the intermediary files created by savas during its
operation are not to be deleted. The -messy option prevents savas from
cleaning up after it has finished. This option is mostly useful for debugging purposes in order to find out
where something went wrong. All intermediary files have a name starting with an underscore ( _ ) followed by
the process ID and are located in the temp directory.
|
|
-obs=n
|
converts only the first n observations. By default, savas converts all
observations of the Stata/SAS dataset.
|
|
-varfile=filename
|
may be used to select only a subset of variables to be included in the Stata/SAS dataset. This will speed up
the conversion process and is useful in situations where the number of variables is too large for a non-Stata SE
(Special Edition) file, more than 2,047 variables. The filename is the name of a file whose contents are variable
names only. These variable names are case-insensitive when saving to Stata. If saving to SAS, multiple
variables can be listed using any of Stata's specified varlist rules. For example, var* is
understood as var1, var2, ... or if saving to Stata, multiple variables with the
same stem may be specified as ranges according to general SAS rules. For example, var1-var20
is understood as var1, var2, ..., var20.
|
|
-n/-nice=n
|
runs SAS/Stata nicely. The default is 20. This should be used if you have a very large datafile and there
are others using the UNIX/Linux box. For example: $ savas -n=10 mystata.dta
|
Features
savas attempts
to transfer Stata value labels to SAS formats and vice versa. savas creates only one format per
value label and vice versa rather than creating a new format or value label for each variable that was assigned that
format or value label. So, if you have a SAS dataset with one yes_no. format assigned to twenty
variables, the new Stata dataset will have one yes_no value label assigned to those twenty variables.
Date formats are translated as closely as possible. Fixed SAS formats ( Fw.d) translate into Stata's
%w.df format. SAS date formats are translated as closely as possible. Unformatted variables
get Stata's default formats for the appropriate data type ( %8.0g for bytes and ints, %9.0g
for floats, and %10.0g for doubles), except for long variables, which savas formats as
%12.0g. savas can process multiple files at a time. Try:
$ savas *.sas7bdat
or:
$ savas *.dta
savas stamps the SAS creation date and time on the Stata dataset name, so that the Stata user knows not
only when the Stata dataset was created, but also the original SAS creation date and time. Not all SAS variable
names are acceptable in Stata.
savas attempts to prevent conflicts by using uppercase names for reserved names. These reserved
names are:
- _all
- _B
- byte
- _coef
- _cons
- double
- float
- if
- in
- int
- long
- _pi
- _pred
- _rc
- _se
- _skip
- _uniform
- using
- with
- names starting with `str' and followed by an integer. (For example, name
"street" does not pose any problems, but a SAS variable named "str10"
will be translated into a Stata variable named "STR10")
- A SAS variable named "_n" translates into "_______N" (and a
warning is issued.)
Not all Stata variable names are acceptable in SAS because Stata allows variable names to be different based on upper or
lower or mixed case. So the variable gender can be in the same dataset as " Gender" or
" GENder" etc. savas attempts to prevent conflicts by testing for
situations like the "gender" issue and when the -rename option is issued savas
attempts to rename the variables to be unique by adding a number to the end of the variable name. If saving to an
older version, then -rename will shorten all variable names that are longer than 8 characters.
Acknowledgements
This script was inspired by the sas2stata script developed at RAND.
Bugs
None known.
SAS character variables may be up to 32,767 characters in length; Stata 7 and 8 Intercooled limit string variables
to 80 characters; Starting with Stata 9 Intercooled and SE limit string variables to 244 characters.
savas will truncate such variables and write out a warning.
Stata Intercooled datasets are limited to 2,047 variables. Stata 6 datasets have a maximum width (number of bytes)
of 8,192. Stata Intercooled datasets have a maximum width (number of bytes) of 24,564. Stata SE datasets can
store as many variables as a SAS 8 dataset, 32,767 and have a max width of 12 times the number of variables. SAS 9
datasets may have an unlimited number of variables.
Back to Main Page
Questions or comments?
Send them to Dan Blanchette
()
|