User Tools

Site Tools


pe:r-lang

This is an old revision of the document!


R for Statistics

Data Wrangling

  library(tidyr)
  getwd()
  setwd()
  vsl <- read.csv("vsl1314.csv")
  > vsl
     ï..year sex dom  exp ra1pat ra1time ra5pat ra5time rv1pat rv1time rv5pat
  1     2014   M   R  7.0      Y      41      Y      35      Y      40      Y
  2     2014   M   R  3.0      Y      45      Y      55      N      45      Y
  3     2014   M   R  4.0      Y      70      Y      45      Y      50      Y
  4     2014   M   R 15.0      N      90      Y      50      Y      70      Y
 
  vsl2 = subset(vsl,select=-c(totnum, success,rate)) # to drop the summary stats
 
  hist(vsltime$time, main="Anastomosis time (all) 2013-2014 n=512")

Histogram to check shape of distribution –> looks skewed to the right

> shapiro.test(vsltime$time)
Shapiro-Wilk normality test
data:  vsltime$time
W = 0.91157, p-value = 1.388e-14

Shapiro-Wilk test for normality –> not normal distribution

> hist(vsl$success)

> shapiro.test(vsl$success)
      Shapiro-Wilk normality test
data:  vsl$success
W = 0.9117, p-value = 0.0002277

Some other analyses:

> plot(vsl$exp,vsl$rate,main="Anastomosis success rate vs experience (in years)")
> plot(vsl$exp,vsl$totnum)
> plot(vsl$exp,((vsl$totnum/8)+(vsl$rate))/2)

Suggests that the first few years of experience does not seem to make a difference to performance, but many years of experience does (? self-selection or already some training).

> stem(((vsl$totnum/8)+(vsl$rate))/2)
The decimal point is 1 digit(s) to the left of the |
 2 | 5
 3 | 
 4 | 
 5 | 
 6 | 1333
 7 | 1111111111112559999999999
 8 | 1111177777888888888888888
 9 | 44444444
10 | 0
> stem(vsl$rate)
The decimal point is 1 digit(s) to the left of the |
 0 | 0
 2 | 
 4 | 00007
 6 | 07777771111155555
 8 | 0000003333366666888
10 | 0000000000000000000000

Sources:

pe/r-lang.1555672657.txt.gz · Last modified: 2020/03/24 02:34 (external edit)