SAS-example Folder:
This folder (browse on GitHub) holds three artifacts that together form a copy-pasteable starting point for projects that include SAS:
MACROS.sas— the macro library (%load_env,%tddays,%winsor,%ff12/%ff49,%iclink, …). Full inventory at the end of this page.002-merge-fdp-data.sas— a real working SAS script usingMACROS.sas. Copied verbatim from theconsensusrepo (where it lives atsrc/002-merge-fdp-data.sas). Same license (CC-BY-4.0).- This
README.md— a step-by-step walkthrough of the conventions, anchored on a trimmed excerpt of002-merge-fdp-data.sas.
The companion project-template repo does not ship SAS scripts, so these reference materials live here instead. For a complete real-world project using the conventions end-to-end across R, Stata, and SAS, see the consensus repo — every SAS script there (002–005) follows the patterns described below; the rest of the pipeline (download, transform, analysis, figures) is in R and Stata.
On this page
- SAS-example Folder:
- A complete example, annotated
- Running the script via
batch_run_sas()for journal-quality logs - Gotchas to keep in mind when writing SAS for
batch_run_sas - Pointing SAS WORK at a different drive (
SAS_WORK_DIR) - Macro inventory: what’s in
MACROS.sas - Why this lives in
example-projectrather than a template
A complete example, annotated
The trimmed excerpt below keeps the scaffolding used for the proc sql steps that follow the preamble. The full ~820-line working script is sitting alongside this README at 002-merge-fdp-data.sas — open it for the complete picture. The trimmed version below demonstrates every convention MACROS.sas is designed around: portable paths via .env, conditional WRDS signon, cached WRDS downloads, and clean exits.
/* 002-merge-fdp-data.sas (excerpt from the consensus repo)
* Link master CCM observations to I/B/E/S forecasts. Downloads any
* missing raw .sas7bdat files from WRDS; skips the download when a
* cached file already exists on disk.
*/
/* Resolve the path to this script. SYSIN is set in batch mode
(sas -SYSIN ..., what batch_run_sas does). SAS_EXECFILEPATH is
set by Enhanced Editor / Enterprise Guide on interactive Submit. */
%let codepath = %sysfunc(getoption(sysin));
%if %length(&codepath) = 0 %then %do;
%let codepath = %sysfunc(sysget(SAS_EXECFILEPATH));
%end;
%include "&codepath\..\MACROS.sas";
%load_env;
libname raw "&RAW_DATA_DIR";
libname data "&DATA_DIR";
/* --- Conditional WRDS signon: only prompt for credentials if a
download is actually needed. */
%let need_wrds = 0;
%macro check_need(target=);
%if not %sysfunc(exist(&target)) %then %let need_wrds = 1;
%mend;
%check_need(target=raw.surpsum_07182025);
%check_need(target=raw.statsum_epsus_07182025);
* ... one %check_need per cached target ...;
%if &need_wrds = 1 %then %do;
options comamid=TCP remote=WRDS;
signon username=_prompt_;
%end;
%else %put NOTE: All WRDS-downloaded raw files already exist -- skipping signon.;
/* --- Cached downloads: re-runs reuse the local .sas7bdat. */
%macro maybe_download(target=, src=);
%if not %sysfunc(exist(&target)) %then %do;
rsubmit;
proc download data=&src out=⌖ run;
endrsubmit;
%end;
%else %put NOTE: &target exists -- skipping download.;
%mend;
%maybe_download(target=raw.surpsum_07182025, src=ibes.surpsum);
%maybe_download(target=raw.statsum_epsus_07182025, src=ibes.statsum_epsus);
* ... etc ...;
/* --- The analysis. proc sql joins / datasteps that link
raw.master_ccm with the just-downloaded raw files and write
to data.all_five1. (See the consensus repo for the full version.) */
/* --- Clean exit: signoff only if we signed on. The %if must be
inside a macro, not in open code -- see Gotchas below. */
%macro maybe_signoff;
%if &need_wrds = 1 %then %do;
signoff;
%end;
%mend;
%maybe_signoff;
Step 1 — Resolve &codepath, %include macros, %load_env
The first three lines answer the question “where is this script running from?” — SAS has no clean built-in for it, but two SAS-provided values cover the two execution modes:
| Source | Populated when |
|---|---|
getoption(sysin) |
Batch mode (sas -SYSIN file.sas, what batch_run_sas does) |
sysget(SAS_EXECFILEPATH) |
Interactive Submit (Enhanced Editor, Enterprise Guide) |
The preamble tries SYSIN first because sysget on an undefined environment variable prints a noisy WARNING that bumps SAS’s exit code to 1 and would otherwise show up in every batch log. Checking SYSIN first avoids triggering that warning in the common batch case.
Once &codepath is known, %include "&codepath\..\MACROS.sas"; pulls in the macro library, and %load_env; parses .env and pushes every KEY=VALUE row into the SAS global macro namespace as &KEY. For the path-derivation rules (%load_env strips the filename + one more directory level), the file= override, and the broader .env convention, see Environment variables and .env.
Step 2 — Libnames from .env
libname raw "&RAW_DATA_DIR";
libname data "&DATA_DIR";
The point of .env is that the paths are not hardcoded in the script. The same script runs on every collaborator’s machine; each has their own .env pointing at their own copy of the data. RAW_DATA_DIR is the folder of raw WRDS pulls (.sas7bdat files downloaded from WRDS); DATA_DIR is for derived datasets the script writes. See the worked example for how this lets two coauthors with different filesystems run the same code.
Step 3 — Conditional WRDS signon
The four-line %check_need pattern matters for replication runs. If every cached .sas7bdat already exists, need_wrds stays 0, the script never calls signon, and a reviewer can verify the analysis end-to-end without a WRDS account. Only when at least one file is missing does the script prompt for credentials. This is the SAS-side counterpart to the R/Python templates’ skip_if_exists argument on download_parquet().
Step 4 — Cached downloads with %maybe_download
Each proc download is wrapped in an existence check. If raw.surpsum_07182025 already exists locally, the macro prints a NOTE and skips. If not, proc download runs inside an rsubmit block (required — proc download is server-side only). A first run downloads everything once and writes to RAW_DATA_DIR; every subsequent run reuses the local copies. Deleting a single .sas7bdat triggers a re-download of just that file.
Step 5 — The analysis
Below the download block in the real script sits the analytical body: proc sql joins linking raw.master_ccm_* to each forecast-data provider (I/B/E/S, FactSet, Zacks, Capital IQ, Bloomberg), datasteps that compute scaled surprises and coverage indicators, and a final write to data.all_five1 (i.e. DATA_DIR/all_five1.sas7bdat). The shape is identical to what the R/Python templates’ transform step does: read raw, derive, write to DATA_DIR. The language is SAS; the conventions are the same.
Step 6 — Conditional signoff (and why it lives in a macro)
%macro maybe_signoff;
%if &need_wrds = 1 %then %do;
signoff;
%end;
%mend;
%maybe_signoff;
The body could have been a plain %if &need_wrds = 1 %then signoff;, but SAS forbids %if in open code under certain nesting conditions (see the open-code-%if gotcha below). Wrapping it in a tiny macro sidesteps the limit. Same trick applies to any %if that has to live outside a %macro definition in a script that earlier called macros containing %if.
Running the script via batch_run_sas() for journal-quality logs
The patterns above produce data, but not yet a log. JAR (and similar code-sharing policies) expect a SAS-log-style record of every step the pipeline ran alongside the code itself — every statement echoed, every PROC’s output interleaved, every NOTE / WARNING / ERROR, in plain text. The templates produce that log by invoking SAS in batch mode through batch_run_sas(), defined in project-template/src/utils.R (R) and project-template/src/utils.py (Python).
From an R run-all.R:
source("src/utils.R") # provides batch_run / batch_run_sas / batch_run_stata
batch_run_sas("src/002-merge-fdp-data.sas")
# -> writes log/002-merge-fdp-data-sas.log
batch_run_sas() internally:
- Locates the SAS binary. Reads
SAS_BINfrom.envif set; otherwise searches common install paths (C:/Program Files/SASHome/SASFoundation/*/sas.exeand Unix equivalents) and thenPATH. - Overrides SAS WORK if
SAS_WORK_DIRis set in.env— see SAS_WORK_DIR below. - Invokes SAS in batch mode:
sas -SYSIN src/002-merge-fdp-data.sas -LOG log/002-merge-fdp-data-sas.log [-WORK <dir>]. - Captures the exit code and surfaces a warning if it’s non-zero.
The default log filename uses a -sas.log suffix so a hypothetical 004-analyze-data.do (Stata) and 004-analyze-data.sas running side-by-side don’t collide on log filename. The .log file itself is the SAS-native log — a reviewer can grep it without running SAS.
A Python run-all.py calls the identically-named batch_run_sas() from utils.py; the contract is the same.
Gotchas to keep in mind when writing SAS for batch_run_sas
A handful of SAS quirks that aren’t well-documented elsewhere and that the example above is structured to work around:
-
Embedded
;inside* ... ;comments terminate the comment early.*Decide whether downloads are needed; only signon if so;parses as*Decide whether downloads are needed;(valid comment) followed byonly signon if so;(treated as code, throwsERROR 180-322: Statement is not valid). Use--,:, or,instead of;inside such comments, or switch to/* ... */block comments. The same trap affects%putstrings —%put NOTE: &target exists; skipping download.;ends at the first;. In batch mode you only see this failure in the.logafter the run completes, so it’s easy to ship a script that runs interactively but fails underbatch_run_sas. -
Open-code
%ifblocks have a nesting limit (per SAS 9.4M5+). Calling a macro that internally contains%iffrom within an open-code%if/%then/%do; ... %end;block — or just calling several macros in a row whose expansions contain%if— can cause a later open-code%ifto fireERROR: Nesting of %IF statements in open code is not supported. %IF ignored.And the ignored%ifthen runs its body unconditionally, silently. Fix: never put%ifin open code that the script reaches after a chain of macro invocations. Wrap each open-code conditional in its own%macro/%mend(the%maybe_signoffpattern above), so the%ifis inside a macro, not in open code. -
Anything inside an
rsubmitblock runs on the WRDS server, not locally.%sysfunc(exist(raw.X))insidersubmitchecks forraw.Xon the WRDS-side libname stack, not your local libname. Existence checks for cached local files must be done outside thersubmit/endrsubmitpair — which is why%maybe_downloadputs the%if not existcheck aroundrsubmit, not inside it.
Pointing SAS WORK at a different drive (SAS_WORK_DIR)
By default SAS uses %TEMP%\SAS Temporary Files for its WORK library — typically the OS drive (C: on Windows). On a heavy pipeline (large CSV imports, big SQL joins, multi-million-row intermediate datasets) the WORK library can balloon into tens of GB. If your OS drive is small or near-full, SAS will error out partway through with messages like:
ERROR: Insufficient space in file WORK.QUARTERLY_2015.DATA.
ERROR: File WORK.QUARTERLY_2015.DATA is damaged. I/O processing did not complete.
batch_run_sas() reads an optional SAS_WORK_DIR environment variable and, when set, passes -WORK <path> to SAS so the WORK library lands wherever you tell it. To enable it, add a line to your .env:
SAS_WORK_DIR=D:/sas-work
(Pick any local-fast-disk folder with ample free space — local SSD, not a network mount.) The folder is created automatically on first run.
If SAS_WORK_DIR is unset or empty, batch_run_sas does not pass -WORK and SAS falls back to its default. You only need to set this if you actually hit the “Insufficient space” failure mode; coauthors with plenty of OS-drive space can leave it alone.
Macro inventory: what’s in MACROS.sas
A copy-pasteable reference for the macros that make the example above work. Each one is defined in MACROS.sas:
%load_env— parses a.envfile (KEY=VALUE per line) and sets each entry as a global SAS macro variable. After%load_env;you can writelibname raw "&RAW_DATA_DIR";and the path resolves from.env. SAS doesn’t ship a community dotenv package, so this fills the gap. Pass%load_env(file=...)to override the default path derivation.%tddays— build a trading-day window relative to an event date using a CRSP trading-day calendar.%tdmins— same idea but at minute granularity, for intraday work.%winsor— winsorize variables at user-supplied percentiles (e.g., 1%/99%).%ff12/%ff49— apply Fama-French 12-industry / 49-industry classifications from a SIC code (originally from https://github.com/JoostImpink/fama-french-industry).%iclink— WRDS’s standard CRSP-IBES linking macro, lightly wrapped so it auto-points at WRDS libraries when you sign on.
Why this lives in example-project rather than a template
The hub README treats example-project as the home for teaching artifacts that don’t fit cleanly into the language-specific template repos. SAS isn’t part of project-template’s pipeline, so adding MACROS.sas to that repo would imply more SAS support than it actually provides. Keeping it here means project-template stays focused on the language combos it actually supports, while readers who want SAS still have a copy-pasteable starting point — anchored against the consensus repo as a working real-world example.
For more on the .env convention this macro file is built around, see the hub README and the Environment variables and .env topic.