library("lessR")
Data transformations for continuous variables are straightforward, just enter the arithmetic expression for the transformation. For each variable identify the corresponding data frame that contains the variable if there is one. For example, the following creates a new variable xsq that is the square of the values of a variable x in the d data frame.
d$xsq <- d$x^2
Or, use the base R transform()
function to accomplish the same, plus other functions from other packages that accomplish the same result.
For variables that define discrete categories, however, the transformation may not be so straightforward with base R functions such as a nested string of ifelse()
functions. An alternative is the lessR function recode()
.
To use recode()
, specify the variable to be recoded with the old_vars
parameter, the first parameter in the function call. Specify values to be recoded with the required old
parameter. Specify the corresponding recoded values with the required new
parameter. There must be a 1-to-1 correspondence between the two sets of values, such as 0:5 recoded to 5:0, six items in the old
set and six items in the new
set.
To illustrate, construct the following small data frame.
<- read.table(text="Severity Description
d 1 Mild
4 Moderate
3 Moderate
2 Mild
1 Severe", header=TRUE, stringsAsFactors=FALSE)
d
## Severity Description
## 1 1 Mild
## 2 4 Moderate
## 3 3 Moderate
## 4 2 Mild
## 5 1 Severe
Now change the integer values of the variable Severity from 1 through 4 to 10 through 40. Because the parameter old_vars
is the first parameter in the definition of recode()
, and because it is listed first, the parameter name need not be specified. The default data frame is d, otherwise specify with the data
parameter.
<- recode(Severity, old=1:4, new=c(10,20,30,40)) d
##
## --------------------------------------------------------
## First four rows of data to recode for data frame: d
## --------------------------------------------------------
## Severity
## 1 1
## 2 4
## 3 3
## 4 2
##
##
## Recoding Specification
## ----------------------
## 1 --> 10
## 2 --> 20
## 3 --> 30
## 4 --> 40
##
## Number of cases (rows) to recode: 5
##
## Replace existing values of each specified variable, no value for option: new.var
##
## --- Recode: Severity ---------------------------------
## Number of unique values of Severity in the data: 4
## Number of values of Severity to recode: 4
##
##
## ------------------------------------------------
## First four rows of recoded data
## ------------------------------------------------
## Severity
## 1 10
## 2 40
## 3 30
## 4 20
d
## Severity Description
## 1 10 Mild
## 2 40 Moderate
## 3 30 Moderate
## 4 20 Mild
## 5 10 Severe
In the previous example, the values of the variable were overwritten with the new values. In the following example, instead write the recoded values to a new variable with the new_vars
parameter, here SevereNew.
<- recode(Severity, new_vars="SevereNew", old=1:4, new=c(10,20,30,40)) d
##
## --------------------------------------------------------
## First four rows of data to recode for data frame: d
## --------------------------------------------------------
## Severity
## 1 10
## 2 40
## 3 30
## 4 20
##
##
## Recoding Specification
## ----------------------
## 1 --> 10
## 2 --> 20
## 3 --> 30
## 4 --> 40
##
## Number of cases (rows) to recode: 5
##
## --- Recode: Severity ---------------------------------
## Unique values of Severity in the data: 10 20 30 40
## Number of unique values of Severity in the data: 4
## >>> Note: A value specified to recode, 1, is not in the data.
##
## >>> Note: A value specified to recode, 2, is not in the data.
##
## >>> Note: A value specified to recode, 3, is not in the data.
##
## >>> Note: A value specified to recode, 4, is not in the data.
##
## Number of values of Severity to recode: 4
## Recode to variable: SevereNew
##
##
## ------------------------------------------------
## First four rows of recoded data
## ------------------------------------------------
## Severity SevereNew
## 1 10 10
## 2 40 40
## 3 30 30
## 4 20 20
A convenient application of recode()
is to Likert data, with responses scored to items on a survey such as from 0 for Strongly Disagree to 5 for Strongly Agree. To encourage responders to carefully read the items, some items are written in the opposite direction so that disagreement indicates agreement with the overall attitude being assessed.
As an example, reverse score Items m01, m02, m03, and m10 from survey responses to the 20-item Mach IV scale. That is, score a 0 as a 5 and so forth. The responses are included as part of lessR and so can be directly read.
<- Read("Mach4") d
##
## >>> Suggestions
## Details about your data, Enter: details() for d, or details(name)
##
## Data Types
## ------------------------------------------------------------
## integer: Numeric data values, integers only
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 Gender integer 351 0 2 0 0 1 ... 0 0 1
## 2 m01 integer 351 0 6 0 0 2 ... 2 1 3
## 3 m02 integer 351 0 6 4 1 1 ... 3 4 3
## 4 m03 integer 351 0 6 1 4 0 ... 3 4 3
## 5 m04 integer 351 0 6 5 4 5 ... 3 4 4
## 6 m05 integer 351 0 6 0 0 4 ... 2 3 3
## 7 m06 integer 351 0 6 5 3 4 ... 4 4 2
## 8 m07 integer 351 0 6 4 3 0 ... 4 4 2
## 9 m08 integer 351 0 6 1 0 5 ... 3 2 3
## 10 m09 integer 351 0 6 5 4 3 ... 3 3 3
## 11 m10 integer 351 0 6 4 4 4 ... 3 4 3
## 12 m11 integer 351 0 6 0 0 1 ... 1 1 2
## 13 m12 integer 351 0 6 0 1 4 ... 2 1 3
## 14 m13 integer 351 0 6 0 1 0 ... 3 1 2
## 15 m14 integer 351 0 6 0 1 0 ... 2 2 2
## 16 m15 integer 351 0 6 4 2 2 ... 3 5 3
## 17 m16 integer 351 0 6 0 4 0 ... 0 2 5
## 18 m17 integer 351 0 6 1 4 2 ... 0 2 2
## 19 m18 integer 351 0 6 3 3 4 ... 4 4 3
## 20 m19 integer 351 0 6 2 1 0 ... 0 0 1
## 21 m20 integer 351 0 6 4 0 1 ... 1 0 3
## ------------------------------------------------------------------------------------------
<- recode(c(m01:m03,m10), old=0:5, new=5:0) d
##
## --------------------------------------------------------
## First four rows of data to recode for data frame: d
## --------------------------------------------------------
## m01 m02 m03 m10
## 1 0 4 1 4
## 2 0 1 4 4
## 3 2 1 0 4
## 4 0 5 2 2
##
##
## Recoding Specification
## ----------------------
## 0 --> 5
## 1 --> 4
## 2 --> 3
## 3 --> 2
## 4 --> 1
## 5 --> 0
##
## Number of cases (rows) to recode: 351
##
## Replace existing values of each specified variable, no value for option: new.var
##
## --- Recode: m01 ---------------------------------
## Number of unique values of m01 in the data: 6
## Number of values of m01 to recode: 6
##
## --- Recode: m02 ---------------------------------
## Number of unique values of m02 in the data: 6
## Number of values of m02 to recode: 6
##
## --- Recode: m03 ---------------------------------
## Number of unique values of m03 in the data: 6
## Number of values of m03 to recode: 6
##
## --- Recode: m10 ---------------------------------
## Number of unique values of m10 in the data: 6
## Number of values of m10 to recode: 6
##
##
## ------------------------------------------------
## First four rows of recoded data
## ------------------------------------------------
## m01 m02 m03 m10
## 1 5 1 4 1
## 2 5 4 1 1
## 3 3 4 5 1
## 4 5 0 3 3
The function also addresses missing data. Existing data values can be converted to an R missing value. In this example, all values of 1 for the variable Plan are considered missing.
<- Read("Employee") d
##
## >>> Suggestions
## Details about your data, Enter: details() for d, or details(name)
##
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 Years integer 36 1 16 7 NA 15 ... 1 2 10
## 2 Gender character 37 0 2 M M M ... F F M
## 3 Dept character 36 1 5 ADMN SALE SALE ... MKTG SALE FINC
## 4 Salary double 37 0 37 53788.26 94494.58 ... 56508.32 57562.36
## 5 JobSat character 35 2 3 med low low ... high low high
## 6 Plan integer 37 0 3 1 1 3 ... 2 2 1
## 7 Pre integer 37 0 27 82 62 96 ... 83 59 80
## 8 Post integer 37 0 22 92 74 97 ... 90 71 87
## ------------------------------------------------------------------------------------------
<- recode(Plan, old=1, new="missing") newdata
##
## --------------------------------------------------------
## First four rows of data to recode for data frame: d
## --------------------------------------------------------
## Plan
## Ritchie, Darnell 1
## Wu, James 1
## Hoang, Binh 3
## Jones, Alissa 1
##
##
## Recoding Specification
## ----------------------
## 1 --> missing
##
##
## R represents missing data with a NA for 'not assigned'.
##
## Number of cases (rows) to recode: 37
##
## Replace existing values of each specified variable, no value for option: new.var
##
## --- Recode: Plan ---------------------------------
## Number of unique values of Plan in the data: 3
## Number of values of Plan to recode: 1
##
##
## ------------------------------------------------
## First four rows of recoded data
## ------------------------------------------------
## Plan
## Ritchie, Darnell NA
## Wu, James NA
## Hoang, Binh 3
## Jones, Alissa NA
Now values of 1 for Plan are missing, having the value of NA
for not available, as shown by listing the first six rows of data with the base R function head()
.
head(d)
## Years Gender Dept Salary JobSat Plan Pre Post
## Ritchie, Darnell 7 M ADMN 53788.26 med 1 82 92
## Wu, James NA M SALE 94494.58 low 1 62 74
## Hoang, Binh 15 M SALE 111074.86 low 3 96 97
## Jones, Alissa 5 F <NA> 53772.58 <NA> 1 65 62
## Downs, Deborah 7 F FINC 57139.90 high 2 90 86
## Afshari, Anbar 6 F ADMN 69441.93 high 2 100 100
The procedure can be reversed in which values that are missing according to the R code NA
are converted to non-missing values. To illustrate with the Employee data set, examine the first six rows of data. The value of Years
is missing in the second row of data.
<- Read("Employee") d
##
## >>> Suggestions
## Details about your data, Enter: details() for d, or details(name)
##
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 Years integer 36 1 16 7 NA 15 ... 1 2 10
## 2 Gender character 37 0 2 M M M ... F F M
## 3 Dept character 36 1 5 ADMN SALE SALE ... MKTG SALE FINC
## 4 Salary double 37 0 37 53788.26 94494.58 ... 56508.32 57562.36
## 5 JobSat character 35 2 3 med low low ... high low high
## 6 Plan integer 37 0 3 1 1 3 ... 2 2 1
## 7 Pre integer 37 0 27 82 62 96 ... 83 59 80
## 8 Post integer 37 0 22 92 74 97 ... 90 71 87
## ------------------------------------------------------------------------------------------
head(d)
## Years Gender Dept Salary JobSat Plan Pre Post
## Ritchie, Darnell 7 M ADMN 53788.26 med 1 82 92
## Wu, James NA M SALE 94494.58 low 1 62 74
## Hoang, Binh 15 M SALE 111074.86 low 3 96 97
## Jones, Alissa 5 F <NA> 53772.58 <NA> 1 65 62
## Downs, Deborah 7 F FINC 57139.90 high 2 90 86
## Afshari, Anbar 6 F ADMN 69441.93 high 2 100 100
Here convert all missing data values for the variables Years and Salary to the value of 99.
<- recode(c(Years, Salary), old="missing", new=99) d
##
## --------------------------------------------------------
## First four rows of data to recode for data frame: d
## --------------------------------------------------------
## Years Salary
## Ritchie, Darnell 7 53788.26
## Wu, James NA 94494.58
## Hoang, Binh 15 111074.86
## Jones, Alissa 5 53772.58
##
##
## Recoding Specification
## ----------------------
## missing --> 99
##
## Number of cases (rows) to recode: 37
##
## Replace existing values of each specified variable, no value for option: new.var
##
## --- Recode: Years ---------------------------------
## Number of unique values of Years in the data: 16
## >>> Note: A value specified to recode, missing, is not in the data.
##
## Number of values of Years to recode: 1
##
## --- Recode: Salary ---------------------------------
## Unique values of Salary in the data: 46124.97 49188.96 49704.79 49868.68 51036.85 53772.58 53788.26 55545.25 56508.32 56772.95 57139.9 57562.36 61055.44 61356.69 61961.29 66312.89 66337.83 69441.93 69547.6 69624.87 71084.02 72321.36 72502.5 72675.26 77714.85 81871.05 83014.43 87785.51 91352.33 92681.19 94494.58 95027.55 99062.66 108138.4 111074.9 122563.4 134419.2
## Number of unique values of Salary in the data: 37
## >>> Note: A value specified to recode, missing, is not in the data.
##
## Number of values of Salary to recode: 1
##
##
## ------------------------------------------------
## First four rows of recoded data
## ------------------------------------------------
## Years Salary
## Ritchie, Darnell 7 53788.26
## Wu, James 99 94494.58
## Hoang, Binh 15 111074.86
## Jones, Alissa 5 53772.58
head(d)
## Years Gender Dept Salary JobSat Plan Pre Post
## Ritchie, Darnell 7 M ADMN 53788.26 med 1 82 92
## Wu, James 99 M SALE 94494.58 low 1 62 74
## Hoang, Binh 15 M SALE 111074.86 low 3 96 97
## Jones, Alissa 5 F <NA> 53772.58 <NA> 1 65 62
## Downs, Deborah 7 F FINC 57139.90 high 2 90 86
## Afshari, Anbar 6 F ADMN 69441.93 high 2 100 100
Now the value of Years in the second row of data is 99.
Sorts the values of a data frame according to the values of one or more variables contained in the data frame, or the row names. Variable types include numeric and factor variables. Factors are sorted by the ordering of their values, which, by default is alphabetical. Sorting by row names is also possible.
To illustrate, use the lessR Employee data set, here just the first 12 rows of data to save space.
<- Read("Employee") d
##
## >>> Suggestions
## Details about your data, Enter: details() for d, or details(name)
##
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 Years integer 36 1 16 7 NA 15 ... 1 2 10
## 2 Gender character 37 0 2 M M M ... F F M
## 3 Dept character 36 1 5 ADMN SALE SALE ... MKTG SALE FINC
## 4 Salary double 37 0 37 53788.26 94494.58 ... 56508.32 57562.36
## 5 JobSat character 35 2 3 med low low ... high low high
## 6 Plan integer 37 0 3 1 1 3 ... 2 2 1
## 7 Pre integer 37 0 27 82 62 96 ... 83 59 80
## 8 Post integer 37 0 22 92 74 97 ... 90 71 87
## ------------------------------------------------------------------------------------------
<- d[1:12,] d
<- Sort(d, Gender) d
##
## Sort Specification
## 2 --> ascending
d
## Years Gender Dept Salary JobSat Plan Pre Post
## Jones, Alissa 5 F <NA> 53772.58 <NA> 1 65 62
## Downs, Deborah 7 F FINC 57139.90 high 2 90 86
## Afshari, Anbar 6 F ADMN 69441.93 high 2 100 100
## Kimball, Claire 8 F MKTG 61356.69 high 2 93 92
## Cooper, Lindsay 4 F MKTG 56772.95 high 1 78 91
## Saechao, Suzanne 8 F SALE 55545.25 med 1 98 100
## Ritchie, Darnell 7 M ADMN 53788.26 med 1 82 92
## Wu, James NA M SALE 94494.58 low 1 62 74
## Hoang, Binh 15 M SALE 111074.86 low 3 96 97
## Knox, Michael 18 M MKTG 99062.66 med 3 81 84
## Campagna, Justin 8 M SALE 72321.36 low 1 76 84
## Pham, Scott 13 M SALE 81871.05 high 2 90 94
<- Sort(d, c(Gender, Salary), direction=c("+", "-")) d
##
## Sort Specification
## 2 --> ascending
## 4 --> descending
d
## Years Gender Dept Salary JobSat Plan Pre Post
## Afshari, Anbar 6 F ADMN 69441.93 high 2 100 100
## Kimball, Claire 8 F MKTG 61356.69 high 2 93 92
## Downs, Deborah 7 F FINC 57139.90 high 2 90 86
## Cooper, Lindsay 4 F MKTG 56772.95 high 1 78 91
## Saechao, Suzanne 8 F SALE 55545.25 med 1 98 100
## Jones, Alissa 5 F <NA> 53772.58 <NA> 1 65 62
## Hoang, Binh 15 M SALE 111074.86 low 3 96 97
## Knox, Michael 18 M MKTG 99062.66 med 3 81 84
## Wu, James NA M SALE 94494.58 low 1 62 74
## Pham, Scott 13 M SALE 81871.05 high 2 90 94
## Campagna, Justin 8 M SALE 72321.36 low 1 76 84
## Ritchie, Darnell 7 M ADMN 53788.26 med 1 82 92
Sort by row names in ascending order.
<- Sort(d, row.names) d
##
## Sort Specification
## row.names --> ascending
d
## Years Gender Dept Salary JobSat Plan Pre Post
## Afshari, Anbar 6 F ADMN 69441.93 high 2 100 100
## Campagna, Justin 8 M SALE 72321.36 low 1 76 84
## Cooper, Lindsay 4 F MKTG 56772.95 high 1 78 91
## Downs, Deborah 7 F FINC 57139.90 high 2 90 86
## Hoang, Binh 15 M SALE 111074.86 low 3 96 97
## Jones, Alissa 5 F <NA> 53772.58 <NA> 1 65 62
## Kimball, Claire 8 F MKTG 61356.69 high 2 93 92
## Knox, Michael 18 M MKTG 99062.66 med 3 81 84
## Pham, Scott 13 M SALE 81871.05 high 2 90 94
## Ritchie, Darnell 7 M ADMN 53788.26 med 1 82 92
## Saechao, Suzanne 8 F SALE 55545.25 med 1 98 100
## Wu, James NA M SALE 94494.58 low 1 62 74
Randomize the order of the data values.
<- Sort(d, random) d
##
## Sort Specification
## random
d
## Years Gender Dept Salary JobSat Plan Pre Post
## Campagna, Justin 8 M SALE 72321.36 low 1 76 84
## Hoang, Binh 15 M SALE 111074.86 low 3 96 97
## Ritchie, Darnell 7 M ADMN 53788.26 med 1 82 92
## Afshari, Anbar 6 F ADMN 69441.93 high 2 100 100
## Saechao, Suzanne 8 F SALE 55545.25 med 1 98 100
## Wu, James NA M SALE 94494.58 low 1 62 74
## Downs, Deborah 7 F FINC 57139.90 high 2 90 86
## Knox, Michael 18 M MKTG 99062.66 med 3 81 84
## Pham, Scott 13 M SALE 81871.05 high 2 90 94
## Kimball, Claire 8 F MKTG 61356.69 high 2 93 92
## Cooper, Lindsay 4 F MKTG 56772.95 high 1 78 91
## Jones, Alissa 5 F <NA> 53772.58 <NA> 1 65 62
rescale(Salary)
## [1] 0.005 1.950 -0.925 -0.139 -0.837 1.118 -0.757 1.347 0.484 -0.545 -0.775 -0.926
<- rd("Mach4", quiet=TRUE) d
<- rd("Mach4_lbl") l
##
## >>> Suggestions
## Details about your data, Enter: details() for d, or details(name)
##
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 label character 20 0 20 Never tell anyone the real reason you did something unless it is useful to do so ... Most people forget more easily the death of a parent than the loss of their property
## ------------------------------------------------------------------------------------------
<- c("Strongly Disagree", "Disagree", "Slightly Disagree",
LikertCats "Slightly Agree", "Agree", "Strongly Agree")
<- factors(m01:m20, levels=0:5, labels=LikertCats) d
Convert the specified variables to factors according to the given vector of three variables only. Leave the original variables unmodified, create new variables.
<- factors(c(m06, m07, m20), levels=0:5, labels=LikertCats, new=TRUE) d
Now copy the variable labels from the original integer variables to the newly created factor variables.
<- factors(c(m06, m07, m20), var_labels=TRUE) l