• Home
  • Compare with SAS

Master R by Comparing with SAS Overview (Join Us)


This section provides a comprehensive overview and review of the similarities and differences between SAS and R so that organizations can make more informed decisions in their SAS to R migration plans and vision.

R is a powerful programming language that is cost effective (at least 40%) , reliable and custom built!  R is both challenging and creative since R packages are similar to putting the hundreds of pieces together in a jigsaw puzzle.  R is extensively used in public health projects, healthcare economics, exploratory/scientific analysis, trend identification, generation of Plots/Graphs, specific Stat analysis, and machine learning.  See pharmaverse section for gxp validation.

R is ideal for:

  1. Smaller projects with budget constraints.
  2. Research prioritizing compelling and interactive data visualizations.
  3. Teams comfortable with open-source platforms and willing to invest in learning R.
  4. Projects requiring flexibility and customization beyond standard functionalities.
  5. Collaboration with researchers and institutions adopting R workflows.

R unique features include:

  1. Flexibility and customization to develop and share R packages.
  2. More quickly adapt cutting-edge statistical techniques.

R and Shiny Apps used by FDA include:

  1. Baseline comparison of treatment groups
  2. Data Anomaly Detection
  3. Interactive Support of FDA Submissions

R generation of plots/graphs include:

  1. Identifying the trends and getting a better understanding of data visually
  2. Understanding the Randomization pattern and checking for bias
  3. Check for Outliers in any lab values or other values of interest
  4. Supports check stratification impact and subgroup analysis checks
  5. Site-level analytics – such as identification of no. SAEs and deaths

R migration includes:

  1. Team Training and Mentoring
  2. Show Value
  3. Build Credibility
  4. Interactivity is the Key
  5. Find Quick Wins
  6. Excel to Shiny
  7. 'Borrow Code' 

For sponsors who want to keep SAS for legacy studies, they may be interested to use R for custom graphs for data visualization or Shiny apps for user and clinical data interactions.  For sponsors who want to be submission ready, it makes sense to apply caution until all R packages have been installed, tested and can produce SDTMs, ADaMs and TLGs.  Sponsors may also want to leverage pharmaverse packages for source or qc purposes.

While most everything from SAS can be replicated in R, there is a steep learning curve since R concepts and process flow are more object oriented. R has meanings for special characters such as [], {} and () for example.  In addition, most of R syntax consists of functions which are similar to SAS functions and macro programs. So, knowing how to call SAS functions and macros will help to understand, write and execute R functions.  Like SAS macro programs, R functions can have positional, keyword and default parameters. See list of R packages install in SAS LSAF.

Few SAS and similar R terms are listed below. 

  • SAS                                                          R
  • non-vector based language                vector based language
  • syntax is not case-sensitive                 syntax and data frame names and variables are case-senstive
  • missing data (., '')                                   NA
  • disk storage                                            memory storage and processing so may be slower with larger data

  •                                                                   unless packages such as data.table are used to optimize data frames

  • round to 2.4 to 2, 2.5 to 3                    round 2.5 to 2 (round .5 to nearest whole number), 2.6 to 3
  • sort missing as first record                 sort NA as last record 
  • summarize                                             aggregate
  • operators: =, and, or                             operators: ==, &, | (and and or are not valid sytax)
  • * comments;                                          # comments
  • /* */ to comment block of code         if(FALSE) { R syntax across multiple lines to comment  }
  • end of command ';'                               NA 
  • data set                                                   data frame
  • observations #                                       slice(), row_number(), rownames()
  • data set options ()                                 data frame options []
  • label                                                        Hmisc::label()
  • variable                                                   vector
  • types: numeric, character                   numeric, character
  •    dates (# of days since Jan 1, 1960)   dates (# of days since Jan 1, 1970)
  • N/A                                                          list variable type
  • rename                                                   rename()
  • modules, procs & functions                R packages and functions (ex. tidyverse)
  • data step: implicit loop across obs    individual variable loops across obs 
  • data steps: retain, if then, vr=, by       ifelse(), mutate(), group() to replicate
  •                                                                  any(, na.rm=TRUE), all(, na.rm=TRUE)
  • output                                                     not easily to replicate
  • first., last.                                                slice(1), slice(n()) 
  • do loops, arrays                                     for loops with data frame index references
  • proc sql                                                    dplyr: select(), mutate(), filter(), case_when(), arrange(), group_by(), %>%
  • left join, right join, inner join, full outer join         left_join(), right_join(), inner_join(), full_join()

  •                                                                   joins do not require sorting in advance

  • subqueries                                              mutate(), summarize(), left_join() to replicate
  • proc compare                                         diffdf()
  • proc contents                                         Hmisc: contents()
  • proc freq                                                  tables()
  • proc means                                             summarize() 
  • proc print                                                 print()
  • proc sort, nodup                                    arrange(), group_by_all(), distinct() 
  • proc transpose                                       pivot_longer(), pivot_wider()
  • numeric functions                                  R numeric functions
  • min, max, mean, sum, median, std     min, max, mean, sum, median, sd


  • character functions                                R character functions

  • catx()                                                         paste0(), paste(), unite()

  • compress()                                               str_extract()
  • find()                                                         str_detect()
  • index()                                                      str_which(), grep()
  • lag(), lead()                                               lag(), lead()
  • lowcase()                                                  tolower()
  • scan()                                                        word(), strsplit(), separate()
  • strip()                                                        str_trim()
  • substr()                                                     str_sub()
  • tranwrd()                                                  str_replace_all()
  • upcase()                                                    toupper()

  • variable type conversion functions    R functions
  • input()                                                      as.numeric() similar to SAS's 8. format
  • put()                                                         as.character() similar to SAS's best. format

  • length()                                                    width option in format() sets the variable length, xportr
  •                                                                   nchar()  returns the # of characters in variable
  •                                                                   length() returns # of variables for data frames & # of records for vars
  •                                                                   default is to set the length to the length of the maximum value
  • count()                                                     count
  • macro programs                                     R functions and user defined functions
  • macro variables                                      Vectors with one or more values
  • global macro variables                          Vectors with one or more values, ex. x <- 'Y', x <<- 'Y'
  • local macro variables                             Variables defined within R functions
  • defaults and keyword parameters      defaults and keyword parameters
  • ODS                                                           R Markdown
  • Logs                                                           logrx
Powered by Wild Apricot Membership Software