'Life is after all a recursive summation, indeed     Let's do some Statistics!

Department of Civil and Environmental Engineering
Frank Batten College of Engineering and Technology
Old Dominion University
Norfolk, Virginia 23529-0241, USA
Tel) (757) 683-3753
Fax) (757) 683-5354


	
Return to CEE 700/800 Homepage
CEE 700/800 Access Counter
 
Go back to
SAS Source Page
Nonparametric Statistics
SAS Source: NONP.SAS

Description

Suppose that you have two independent random, continuous, yet non-normal, populations X1 ( X11, X12, ... Xn1) and X2 ( X21, X22, ... Xn2) with means µ1 and µ2 with sample sizes of n1 n2.

Since either or both populations are not normally distributed, you can not use normal PD-based test such as t-test to compare means. However, assuming that the distributions of X1 and X2 have the same shape and spread, and differ only (possibly) in their means, you can use nonparametric statistics technique/Wilcoxon Rank-Sum Test to test the hypothesis of equal central tendency.

H0:     µ1 = µ2
H1:     µ1 µ2

Nonparametric statistics technique is such a simple yet amazingly powerful way to test the central tendency and dispersion between two populations, either or both non-normally distributed. This means that you can also apply nonparametric statistics technique to test normally distributed populations!

SAS Listing


OPTIONS LINESIZE=80; 
TITLE1 'Paired Wilcoxon Rank-Sum Test for Mean Comparison';
TITLE2 '(Also called as Mann-Whitney Test)';

/* Mean axial stress in tensile members used in  */
/* a structural joint. Alloy 1 is a traditional  */
/* material, and Alloy 2 is a new aluminum-      */
/* lithium alloy.  Stress values are in psi      */

DATA A_Stress;
/* @@ means a loop in reading input variable sequence */
INPUT Alloy Stress @@;
CARDS;
1 3238 1 3254 1 3195 1 3229 1 3246
1 3225 1 3190 1 3217 1 3204 1 3241
2 3261 2 3248 2 3187 2 3215 2 3209
2 3226 2 3212 2 3240 2 3258 2 3234
;
RUN;

/* Print the original data set                   */
PROC PRINT; 
RUN;

/* Nonparametric statistics/Wilcoxon Rank-Sum    */
PROC NPAR1WAY DATA=A_Stress WILCOXON;
  CLASS Alloy;
  VAR Stress;
RUN;

SAS Listing

Here, NPAR1WAY tests the null hypothesis that there is no difference in the mean axial stress in tensile members used in a structural joint against an alternative hypothesis that the mean axial stress differs in the two alloys. The WILCOXON option requests the Wilcoxon test for difference in location

In addition, you can also run the median test for difference in location (MEDIAN option) as well as empirical distribution function statistics (EDF option).

SAS Listing


OPTIONS LINESIZE=80;
TITLE1 'Paired Wilcoxon Rank-Sum Test for Mean Comparison';
TITLE2 '(Also called as Mann-Whitney Test)';

/* Mean axial stress in tensile members used in  */
/* a structural joint. Alloy 1 is a traditional  */
/* material, and Alloy 2 is a new aluminum-      */
/* lithium alloy.  Stress values are in psi      */

DATA A_Stress;
/* @@ means a loop in reading input variable sequence */
INPUT Alloy Stress @@;
CARDS;
1 3238 1 3254 1 3195 1 3229 1 3246
1 3225 1 3190 1 3217 1 3204 1 3241
2 3261 2 3248 2 3187 2 3215 2 3209
2 3226 2 3212 2 3240 2 3258 2 3234
;
RUN;

/* Print the original data set                   */
PROC PRINT;
RUN;

/* WILCOXON / Paired Wilcoxon Rank-Sum test for difference in location */
/* MEDIAN / median test for difference in location    */
/* EDF / empirical distribution function statistics   */
PROC NPAR1WAY DATA=A_Stress WILCOXON MEDIAN EDF;
  CLASS Alloy;
  VAR Stress;
RUN;

SAS Listing

Even with its flexibility and succincity, one "not-so-great" thing about the Nonparametric test is that it compares only a pair-wise at a time. (contrast to MMC in AVOVA/GLM) If you have only two samples to compare, then this "pair-wise at a time" is not an issue.

But what about the case that you need to compare, let's say, 6 samples in their C.T. and dispersion? To make pair-wise comparison of 6 samples, you have to compare explicitly 15 times in Nonparametric tests. (i.e., 6C2 combination)

Hard way of doing this is to manually prepare data sets for 15 different combination, then run the Nonparametric tests, either within a single SAS src or 15 separate SAS srcs.

For example, let's think about an example that compares yearly C.T. and dispersion of F.Coliform concentrations from 7 samples (bewteen 1996 and 2002) using Nonparametric Wilcoxon Rank Sum test.

First, the hard way, i.e., manual data set preparation -- to make pair-wise comparison of 7 samples, you have to manually prepare 21 sample data sets! (i.e., 7C2 combination);


OPTIONS LINESIZE=80 Source;
/* --------- 19961997 ---------- */
DATA y19961997;
INPUT Year $ Month $ Intensity C_diff  @@;
CARDS;
1996 Nov 2.87 4.852
1996 Dec 3.86 4.078
1997 Feb 2.94 2.197
1997 Mar 3.21 0.993
1997 Apr 3.01 4.220
1997 May 1.66 3.227
1997 Jul 7.85 4.673
1997 Aug 1.76 0.693
1997 Nov 4.94 4.078
;
RUN;

PROC NPAR1WAY DATA=y19961997 WILCOXON;
TITLE1 '========================================== ';
TITLE2 '1996 vs. 1997 -- C_diff | Yearly Comparison';
TITLE3 '========================================== ';
  CLASS Year;
  VAR C_diff;
  EXACT WILCOXON / ALPHA=0.05;
RUN;

/* --------- 19961998 ---------- */
DATA y19961998;
INPUT Year $ Month $ Intensity C_diff  @@;
CARDS;
1996 Nov 2.87 4.852
1996 Dec 3.86 4.078
1998 Mar 4.15 6.966
1998 Apr 4.31 2.721
1998 May 3.99 4.248
1998 Jun 4.56 4.346
1998 Jul 3.91 4.025
1998 Aug 8.47 4.635
1998 Sep 2.25 4.673
1998 Oct 1.73 3.555
1998 Dec 5.33 3.045
;
RUN;

PROC NPAR1WAY DATA=y19961998 WILCOXON;
TITLE1 '========================================== ';
TITLE2 '1996 vs. 1998 -- C_diff | Yearly Comparison';
TITLE3 '========================================== ';
  CLASS Year;
  VAR C_diff;
  EXACT WILCOXON / ALPHA=0.05;
RUN;

.
.
.
.
/* --------- 20012002 ---------- */
DATA y20012002;
INPUT Year $ Month $ Intensity C_diff  @@;
CARDS;
2001 Jan 1.46 2.625
2001 Feb 2.16 3.054
2001 May 2.89 6.966
2001 Jun 6.91 6.620
2001 Jul 4.4 2.398
2001 Sep 2.46 5.394
2001 Nov 0.08 1.335
2002 Jan 4.22 5.247
2002 Mar 4.91 3.718
2002 May 3.5 4.796
2002 Jul 3.59 3.091
2002 Sep 6.69 5.298
;
RUN;

PROC NPAR1WAY DATA=y20012002 WILCOXON;
TITLE1 '========================================== ';
TITLE2 '2001 vs. 2002-- C_diff | Yearly Comparison';
TITLE3 '========================================== ';
  CLASS Year;
  VAR C_diff;
  EXACT WILCOXON / ALPHA=0.05;
RUN;

Now, the less-painful way (still cumbersome though) -- read the whole data set all at once, implicitly define 21 sub-datasets within a single SAS src and run Nonparametric tests all at once.


OPTIONS LINESIZE=80 Source Nodate;
DATA Coliform;
INPUT Year $ Month $ Intensity C_diff  @@;
CARDS;
1996 Nov 2.87 4.852
1996 Dec 3.86 4.078
1997 Feb 2.94 2.197
1997 Mar 3.21 0.993
1997 Apr 3.01 4.220
1997 May 1.66 3.227
1997 Jul 7.85 4.673
1997 Aug 1.76 0.693
1997 Nov 4.94 4.078
1998 Mar 4.15 6.966
1998 Apr 4.31 2.721
1998 May 3.99 4.248
1998 Jun 4.56 4.346
1998 Jul 3.91 4.025
1998 Aug 8.47 4.635
1998 Sep 2.25 4.673
1998 Oct 1.73 3.555
1998 Dec 5.33 3.045
1999 Feb 2.33 1.825
1999 Mar 3.29 -1.609
1999 Sep 13.16 5.602
1999 Dec 1.71 2.485
2000 Jan 5.07 4.317
2000 Feb 1.13 1.758
2000 Mar 2.4 -1.609
2000 Apr 3.7 5.252
2000 May 4.05 3.045
2000 Jun 8.31 6.267
2000 Jul 7.52 7.364
2000 Aug 8.35 6.966
2000 Sep 6.25 2.721
2000 Oct 0.01 3.497
2000 Nov 1.67 4.949
2000 Dec 0.97 3.006
2001 Jan 1.46 2.625
2001 Feb 2.16 3.054
2001 May 2.89 6.966
2001 Jun 6.91 6.620
2001 Jul 4.4 2.398
2001 Sep 2.46 5.394
2001 Nov 0.08 1.335
2002 Jan 4.22 5.247
2002 Mar 4.91 3.718
2002 May 3.5 4.796
2002 Jul 3.59 3.091
2002 Sep 6.69 5.298
;
RUN;

/* Parsing a pair-wise YEARLY sub-dataset for Wilxcoxon NP test  */

DATA y19961997;  /* Define/Name a Sub-dataset to be created */
   set Coliform;  /* The source dataset to create a sub-dataset from */
   if Year = "1996" or Year = "1997"; /* Condition for sub-dataset, could be multiple levels */
   OUT=y19961997; /* Create/Output to Sub-dataset */
RUN;

DATA y19961998;
   set Coliform; 
   if Year = "1996" or Year = "1998";
   OUT=y19961998;
RUN;
.
.
.
DATA y19962002;
   set Coliform; 
   if Year = "1996" or Year = "2002";
   OUT=y19962002;
RUN;

Logical Operators that can be used in "if" statement for creating sub-dataset

Equals =, EQ if X1 = 8;
if X6 EQ "N/A";
Not equal <>, NE if X2 NE 3;
if site <> "chesapeake";
Greater than >, GT if X5*0.125 > 0.4;
if X5*0.125 GT 0.4;
Less than <, LT if X7/2.3 < 1;
if X7/2.3 LT 1;
Greater than or equal >=, GE if X11 >= 11;
if X11 GE 11;
Less than or equal <=, LE if X33 <= 200;
if X33 LE 200;
logical AND AND if (SRPspring = SRPfall) AND (log(SRPspring) > 0.6);
logical OR OR if (SRPspring = SRPfall) OR (log(SRPspring) > 0.6);
logical NOT NOT if (TNsummer GT 25) NOT (log(TNsummer) > 3.81);


/* ------------------------------------------------ */
PROC NPAR1WAY DATA=y19961997 WILCOXON;
TITLE1 '================================ ';
TITLE2 '1996 | 1997 -- Pair-wise C_diff | Yearly Comparison';
TITLE3 '================================ ';
  CLASS Year;
  VAR C_diff Intensity;
  EXACT WILCOXON / ALPHA=0.05;
RUN;

PROC NPAR1WAY DATA=y19961998 WILCOXON;
TITLE1 '================================ ';
TITLE2 '1996 | 1998 -- Pair-wise C_diff | Yearly Comparison';
TITLE3 '================================ ';
  CLASS Year;
  VAR C_diff Intensity;
  EXACT WILCOXON / ALPHA=0.05;
RUN;

.
.
.
PROC NPAR1WAY DATA=y20012002 WILCOXON;
TITLE1 '================================ ';
TITLE2 '2001 | 2002 -- Pair-wise C_diff | Yearly Comparison';
TITLE3 '================================ ';
  CLASS Year;
  VAR C_diff Intensity;
  EXACT WILCOXON / ALPHA=0.05;
RUN;

Pay close attention to PROC NPAR1WAY DATA=**** statement to see how to use implicit sub-data sets defined/created from the original, single data set.

SAS User Guide (SUG) for Procedures (PROC) used in the Source

SUG PRINT procedure
SUG NPAR1WAY procedure
Go back to
SAS Source Page

Return to CEE 700/800 Homepage Return to CEE 700/800 Homepage Move to the Top of this page