User Tools

Site Tools


pe:r-lang

This is an old revision of the document!


R for Statistics

Data Wrangling

library(tidyr)
getwd()
setwd()
vsl <- read.csv("vsl1314.csv")
> vsl
   ï..year sex dom  exp ra1pat ra1time ra5pat ra5time rv1pat rv1time rv5pat
1     2014   M   R  7.0      Y      41      Y      35      Y      40      Y
2     2014   M   R  3.0      Y      45      Y      55      N      45      Y
3     2014   M   R  4.0      Y      70      Y      45      Y      50      Y
4     2014   M   R 15.0      N      90      Y      50      Y      70      Y

vsl2 = subset(vsl,select=-c(totnum, success,rate)) # to drop the summary stats

hist(vsltime$time, main="Anastomosis time (all) 2013-2014 n=512")

Histogram to check shape of distribution –> looks skewed to the right

> shapiro.test(vsltime$time)
Shapiro-Wilk normality test
data:  vsltime$time
W = 0.91157, p-value = 1.388e-14

Shapiro-Wilk test for normality –> not normal distribution

> hist(vsl$success)

> shapiro.test(vsl$success)
      Shapiro-Wilk normality test
data:  vsl$success
W = 0.9117, p-value = 0.0002277

Some other analyses:

> plot(vsl$exp,vsl$rate,main="Anastomosis success rate vs experience (in years)")
> plot(vsl$exp,vsl$totnum)
> plot(vsl$exp,((vsl$totnum/8)+(vsl$rate))/2)

Suggests that the first few years of experience does not seem to make a difference to performance, but many years of experience does (? self-selection or already some training).

pe/r-lang.1555667908.txt.gz · Last modified: 2020/03/24 02:34 (external edit)