Tech Tip: Generating Outliers from data sets
PRODUCT: 4D | VERSION: 14.0 | PLATFORM: Mac & Win
Published On: October 16, 2014
Below is a sample method to generate an array of outliers for a data set of numbers:
//$1-> Pointer to input array //$2-> Pointer to output array //$3-> Optional Boolean Parameter //...False Only Shows Non-Extreme Outliers //...True Only Shows Extreme Outliers //Declare Variables and Parameters C_POINTER($1->;$2->) C_BOOLEAN($3->;$out) C_LONGINT($size;$q1;$q2;$q3;$counter) C_REAL($IQR;$loOut;$hiOut;$exLoOut;$exHiOut) ARRAY LONGINT($arrayIn;0) ARRAY LONGINT($arrayRes;0) If ((Count parameters=2)|(Count parameters=3)) //Organize data set COPY ARRAY($1->->;$arrayIn) SORT ARRAY($arrayIn) //Calculate Quartiles $size:=Size of array($arrayIn) $q1:=$size/4 $q2:=$size/2 $q3:=$q1+$q2 $IQR:=$arrayIn{$q3}-$arrayIn{$q1} //Calculate Inner Fences for data set $loOut:=$arrayIn{$q1}-(1.5*$IQR) $hiOut:=$arrayIn{$q3}+(1.5*$IQR) //Check parameters to see which Outliers to return //Then run for loops to check which values to return If (Count parameters=2) For ($counter;1;$size) If (($arrayIn{$counter}<$loOut)|($arrayIn{$counter}>$hiOut)) APPEND TO ARRAY($arrayRes;$arrayIn{$counter}) End if End for End if If (Count parameters=3) $out:=$3-> //Calculate Outer Fences for data set $exLoOut:=$arrayIn{$q1}-(3*$IQR) $exHiOut:=$arrayIn{$q3}+(3*$IQR) If ($out=False) For ($counter;1;$size) If ((($arrayIn{$counter}<$loOut)&($arrayIn{$counter}>$exLoOut))|(($arrayIn{$counter}>$hiOut)&($arrayIn{$counter}<$exHiOut))) APPEND TO ARRAY($arrayRes;$arrayIn{$counter}) End if End for Else For ($counter;1;$size) If (($arrayIn{$counter}<$exLoOut)|($arrayIn{$counter}>$exHiOut)) APPEND TO ARRAY($arrayRes;$arrayIn{$counter}) End if End for End if End if //Return array of desired outliers COPY ARRAY($arrayRes;$2->) End if |
Saving the method as Array_Outliers, an example using the method is shown below:
//Declare Variables ARRAY LONGINT($array;0) ARRAY LONGINT($arrayRes1;0) ARRAY LONGINT($arrayRes2;0) ARRAY LONGINT($arrayRes3;0) //Sample Data Set //...Extreme Low Outlier APPEND TO ARRAY($array;-20) //...Low Outlier APPEND TO ARRAY($array;2) //...Data within Typical Range APPEND TO ARRAY($array;21) APPEND TO ARRAY($array;22) APPEND TO ARRAY($array;24) APPEND TO ARRAY($array;25) APPEND TO ARRAY($array;28) APPEND TO ARRAY($array;35) APPEND TO ARRAY($array;23) APPEND TO ARRAY($array;24) APPEND TO ARRAY($array;25) APPEND TO ARRAY($array;29) APPEND TO ARRAY($array;33) //...High Outlier APPEND TO ARRAY($array;50) //...Extreme High Outlier APPEND TO ARRAY($array;100) Array_Percentile(->$array;->$arrayRes1) Array_Percentile(->$array;->$arrayRes2;False) Array_Percentile(->$array;->$arrayRes3;True) |
When executed, the method above will result in the folowing:
-$arrayRes1 will contain {-20, 2, 50, 100}
-$arrayRes2 will contain {2, 50}
-$arrayRes3 will contain {-20, 100}
Locating outliers in data sets is useful in analyzing the information. It can also be helpful to extract the quartile values, inner fences, and outer fences for more informative details on the data sets.