Factor
As you can see, variables may be attributed to nominal, ordered, or continuous variables. A nominal variable is a categorical variable with no order. Diabetic Type Diabetes (type1,type2) is an example of a nominal variant. Even if the TYPE1 encoding in the data is 1 and the Type2 code is 2, it does not mean that the two are ordered. Ordered variables represent a sequential relationship, not a quantity relationship. The condition status (poor,improved,excellent) is a good example of a sequential variant. We understand that the condition is poor (poor) patient's condition is inferior to improved (the condition improves) patient, but does not know the difference how much. A continuous variable can be rendered as any value within a range, and both the order and the quantity are represented. Age is a continuous type variable.
Category (nominal) variables and ordered categories (ordered) variables are called factors (factor) in R. Factor is important in R because it determines how the data is analyzed and how it is rendered visually.
The function factor () stores the class value as an integer vector, where the value range of the integer is [1...K] (where k is the only value in the nominal variable), and an internal vector consisting of a string (the original values) is mapped to those integers.
For example, suppose there are vectors:
Diabetes<-c ("Type1", "Type2", "Type1", "Type1")
Statement Diabetes<-factor (diabetes) stores this vector as (1,2,1,1) and internally associates it as 1=type1 and 2=type2 (specific assignments are based on alphabetical order). Any analysis of vector diabetes is treated as a nominal variable and automatically selects a statistical method suitable for this measurement scale.
To represent an ordered variable, you need to specify the parameter ordered=true for the function factor (). To the directed quantity:
Status<-c ("Poor", "improved", "excellent", "Poor")
The statement status<-factor (status,ordered=true) encodes the vector as (3,2,1,3) and internally associates the values as 1=excellent, 2=improved, and 3=poor. In addition, any analysis of this vector will treat it as an ordered variable and automatically select the appropriate statistical method.
For character vectors, the horizontal default of the factor is created alphabetically. This is meaningful for factor status because the order of "excellent", "improved", and "Poor" is exactly the same as the logic. If "Poor" is encoded as "ailing", there will be a problem because the order will be "ailing", "excellent", "improved". If the order in the ideal is "Poor", "improved", "excellent", a similar problem occurs. The factors that are sorted by default alphabetical order are good enough to be satisfying.
You can override the default sort by specifying the levels option. For example:
Status<-factor (Status,order=true,levels=c ("Poor", "improved", "excellent"))
Each level will be assigned a value of 1=poor,2=improved,3=excellent. Make sure that the specified level matches the actual value in the data, because any data that appears in the data that is not enumerated in the parameter will be set to the missing value. > patientid<-c (1,2,3,4)
> age<-c (25,34,28,52)
> diabetes<-c ("Type1", "Type2", "Type1", "Type1")
> status<-c ("Poor", "improves", "excellent", "Poor")
> Diabetes<-factor (Diabetes)
> Status<-factor (status,order=true)
> Patientdata<-data.frame (patientid,age,diabetes,status)
> str (patientdata)
' Data.frame ': 4 obs. of 4 variables:
$ patientid:num 1 2 3 4
$ age:num 25 34 28 52
$ diabetes:factor W/2 Levels "Type1", "Type2": 1 2 1 1
$ status:Ord.factor W/3 levels "excellent" < "improves" <..: 3 2 1 3
> Summary (patientdata)
Patientid Age Diabetes Status
Min.: 1.00 min.: 25.00 Type1:3 excellent:1
1st qu.:1.75 1st qu.:27.25 type2:1 improves:1
median:2.50 median:31.00 Poor:2
mean:2.50 mean:34.75
3rd qu.:3.25 3rd qu.:38.50
Max. : 4.00 Max. : 52.00
The function str (object) provides information about an object in R (in this case, the data frame). Clearly showing that diabetes is a factor, and status is an ordered factor, how the data frame is encoded internally.
Factor factor (), str ()