User Tools

Site Tools


pe:r-lang

R for Statistics

Data Wrangling

  library(tidyr)
  getwd()
  setwd()
  vsl <- read.csv("vsl1314.csv")
  > vsl
     ï..year sex dom  exp ra1pat ra1time ra5pat ra5time rv1pat rv1time rv5pat
  1     2014   M   R  7.0      Y      41      Y      35      Y      40      Y
  2     2014   M   R  3.0      Y      45      Y      55      N      45      Y
  3     2014   M   R  4.0      Y      70      Y      45      Y      50      Y
  4     2014   M   R 15.0      N      90      Y      50      Y      70      Y
 
  vsl2 = subset(vsl,select=-c(totnum, success,rate)) # to drop the summary stats
 
  hist(vsltime$time, main="Anastomosis time (all) 2013-2014 n=512")

Histogram to check shape of distribution –> looks skewed to the right

  > shapiro.test(vsltime$time)
 
  Shapiro-Wilk normality test
 
  data:  vsltime$time
  W = 0.91157, p-value = 1.388e-14

Shapiro-Wilk test for normality –> not normal distribution

  > hist(vsl$success)

  > shapiro.test(vsl$success)
        Shapiro-Wilk normality test
  data:  vsl$success
  W = 0.9117, p-value = 0.0002277

Some other analyses:

  > plot(vsl$exp,vsl$rate,main="Anastomosis success rate vs experience (in years)")
  > plot(vsl$exp,vsl$totnum)
  > plot(vsl$exp,((vsl$totnum/8)+(vsl$rate))/2)

Suggests that the first few years of experience does not seem to make a difference to performance, but many years of experience does (? self-selection or already some training). Some form of correlation analysis might be helpful here.

Spread

  > stem(((vsl$totnum/8)+(vsl$rate))/2)
 
  The decimal point is 1 digit(s) to the left of the |
   2 | 5
   3 | 
   4 | 
   5 | 
   6 | 1333
   7 | 1111111111112559999999999
   8 | 1111177777888888888888888
   9 | 44444444
  10 | 0
 
  > stem(vsl$rate)
 
  The decimal point is 1 digit(s) to the left of the |
   0 | 0
   2 | 
   4 | 00007
   6 | 07777771111155555
   8 | 0000003333366666888
  10 | 0000000000000000000000

See also Combined 13-14

Sources

pe/r-lang.txt · Last modified: 2020/03/24 02:34 (external edit)