library(tidyr) getwd() setwd() vsl <- read.csv("vsl1314.csv") > vsl ï..year sex dom exp ra1pat ra1time ra5pat ra5time rv1pat rv1time rv5pat 1 2014 M R 7.0 Y 41 Y 35 Y 40 Y 2 2014 M R 3.0 Y 45 Y 55 N 45 Y 3 2014 M R 4.0 Y 70 Y 45 Y 50 Y 4 2014 M R 15.0 N 90 Y 50 Y 70 Y vsl2 = subset(vsl,select=-c(totnum, success,rate)) # to drop the summary stats hist(vsltime$time, main="Anastomosis time (all) 2013-2014 n=512")
Histogram to check shape of distribution –> looks skewed to the right
> shapiro.test(vsltime$time) Shapiro-Wilk normality test data: vsltime$time W = 0.91157, p-value = 1.388e-14
Shapiro-Wilk test for normality –> not normal distribution
> hist(vsl$success)
> shapiro.test(vsl$success) Shapiro-Wilk normality test data: vsl$success W = 0.9117, p-value = 0.0002277
Some other analyses:
> plot(vsl$exp,vsl$rate,main="Anastomosis success rate vs experience (in years)") > plot(vsl$exp,vsl$totnum) > plot(vsl$exp,((vsl$totnum/8)+(vsl$rate))/2)
Suggests that the first few years of experience does not seem to make a difference to performance, but many years of experience does (? self-selection or already some training). Some form of correlation analysis might be helpful here.
> stem(((vsl$totnum/8)+(vsl$rate))/2) The decimal point is 1 digit(s) to the left of the | 2 | 5 3 | 4 | 5 | 6 | 1333 7 | 1111111111112559999999999 8 | 1111177777888888888888888 9 | 44444444 10 | 0 > stem(vsl$rate) The decimal point is 1 digit(s) to the left of the | 0 | 0 2 | 4 | 00007 6 | 07777771111155555 8 | 0000003333366666888 10 | 0000000000000000000000
See also Combined 13-14