So what investment fund should we buy in our ISA? That’s a question discussed in the media between Christmas and New Year. I want to show you that you can select consistently good performing funds by just knowing a bit of stats, and a bit of R. Good way of getting back into stats after the Christmas festivities.

You are currently browsing the archive for the **R: Descriptive stats** category.

Bar plots are for when we have categorical variables – known also as factors in ANOVA.

Instances we may use barplots: descriptive stats when we have categorical variables; regression when we have categorical variables; crosstabs.

`#compare with hist() command`

Windows()

hist(num$cells)

#2. stacked bar plot 1: plot of frequencies of 2 categorical variables

table(num$smoker, num$weight)

barplot(table(num$smoker, num$weight))

#2. plot of means of DV across categorical variables in regression

tab = with(num, tapply(cells,list(smoker,weight),mean))

tab

barplot(tab)

#3 bar plot with bars arranged by group

barplot(tab,beside=T)

#color and labels

barplot(tab,beside=T,col=c(1,2))

#shading and legend

barplot(tab,beside=T,col=NA,density=c(10,20))

legend(1,3.5, c("non-smoker","smoker"), col=NA, density=c(10,20))

One of the first plots we learn about is the histogram which is easy to interpret. No so the q-q plot, whose purpose is to shed light as to whether the variable (data) comes from a specified distribution. Here I wanna simulate data to see what the normal q-q plot looks like for symmetric distributions with fat tails, and skewed distributions. The command to plot the normal q-q plot is **qqnorm( )**

### R Code for simulating data from a number of distributions and then get the q-q plot

#simulate from various distributions

simn=rnorm(10000,0,2) # simulate 10k observations from N(0,2)

simchi=rchisq(10000,6) #simulate from chi-square(6)

simchi2= - simchi # create negative skew distribution from chi-squared distribution

simt= rt(10000,10) # simulate t-distribution

#Plots in 2 graphics windows

par(mfrow=c(2,2)) #set up graphics page, 2x2 table

hist(simn, main="Symmetric distribution", xlab="")

qqnorm(simn)

qqline(simn)

hist(simt, main="Symmetric with fat tails", xlab="")

qqnorm(simt)

qqline(simt)

windows() #second graphics windows pops up

par(mfrow=c(2,2))

hist(simchi, main="Postive skew", xlab="")

qqnorm(simchi)

qqline(simchi)

hist(simchi2, main="Negative skew", xlab="")

qqnorm(simchi2)

qqline(simchi2)

Once your data is ready for analysis, you need to obtain the descriptive statistics.

Video with examples to show how to obtain in R:

#summary stats (mean,median, min, max, sd, quantile, range, skewness, kurtosis, #not for mode) for one variable – vector and dataframe

#individual stats for observations in a vector/dataframe

#individual stats for subset of variables in a dataframe

#summary stats for a continuous variable over a factor/group

#frequency table applicable for factors

#summary stats for one variable: vector and dataframe

x=c(10,15,18,25,30)

summary(x)

ToothGrowth # len = continuous, supp = nominal, dose = ordinal/cts

summary(ToothGrowth,summary)

#individual stats for observations in a vector

mean(x)

sd(x) # other commands are: median, min, max, quantile, range

# to extend to skewness and kurtosis install moments package

install.packages(moments)

skewness(x)

kurtosis(x)

#individual stats for variables in a dataframe

# we show this for the mean. In place of the mean, you can use

# median, min, max, quantile, range, skewness, kurtosis

```
```

`mean(ToothGrowth) #deprecated.`

mean(ToothGrowth$len)

skewness(ToothGrowth$len)

kurtosis(ToothGrowth$len)

#format for sapply is sapply(

# mean, sd, min, max, median, range,quantile, skewness, kurtosis

sapply(ToothGrowth[,c(1,3)], mean) # mean for vars in col 1 and 3

sapply(ToothGrowth[,c(1,3)], mean)

#summary stats by factor/group

# use split() in sapply()

# sapply(split(

sapply(split(ToothGrowth$len, ToothGrowth$supp), mean)

with(ToothGrowth, sapply(split(len,supp), mean)) #same but using with() instead of df$var

#frequency table applicable for factors

table(dose,supp)

Testing for an association/relationship/independence between two (qualitative) factors.