Thought of the Day — George Pólya

Examine your guess. Your guess may be right, but it is foolish to accept a vivid guess as a proven truth — as primitive people often do. Your guess may be wrong. But it is also foolish to disregard a vivid guess altogether — as pedantic people sometimes do. Guesses of a certain kind deserve to be examined and taken seriously: those which occur to us after we have attentively considered and really understood a problem in which we are genuinely interested. Such guesses usually contain at least a fragment of the truth although, of course, they very seldom show the whole truth. Yet there is a chance to extract the whole truth if we examine such a guess appropriately.

Many a guess has turned out to be wrong but nevertheless useful in leading to a better one.

No idea is really bad, unless we are uncritical. What is really bad is to have no idea at all.

From: Observational Epidemiology

5 Reasons to Run Sample Size Calculations Before Collecting Data

Most of us run sample size calculations when a granting agency or committee requires it.  That’s reason 1.

That is a very good reason.  But there are others, and it can be helpful to keep these in mind when you’re tempted to skip this step or are grumbling through the calculations you’re required to do.

It’s easy to base your sample size on what is customary in your field (”I’ll use 20 subjects per condition”) or to just use the number of subjects in a similar study (”They used 150, so I will too”).

Sometimes you can get away with doing that.

However, there really are some good reasons beyond funding to do some sample size estimates. And since they’re not especially time-consuming, it’s worth doing them.

Often the most time consuming part is figuring out and writing the data analysis plan to base the calculations on, but that’s another step you should do anyway.

Reason 2:  Many, many published studies have very low power, and are bad sources for basing your sample size on.

As reported in Keppel, Cohen calculated the power of every study in a psychology journal for a year. The average power was just under 50%.

If power is 50% for a study, it basically means that that study had a 50% chance of finding significant results given the sample size, the effect size, and the statistical test.  Because these were published studies, they must have had significant results.  But there were probably a lot of other studies (just as many) that never got published because they just didn’t have adequate power.

If you now attempt to build on that study and you use the same sample size, you only have a 50% change of replicating it with significant results. Do your own power calculation and raise the sample size, if needed.

Reason 3: A power calculation estimates not only how many participants you need, but how many you don’t need.

You don’t want to spend any more  resources–time, money, and energy–collecting more data than you need.  Save those resources for a follow-up study.

Especially if your study creates any risk, or even inconvenience, for your human or animal participants, you don’t want to oversize your study either. You don’t want to expose more participants than necessary to the risk.

Reason 4: When sample size calculations tell you you’re close, but have not quite enough subjects, you can make adjustments to the study that will increase the power in other ways.

Maybe you can adjust the way you’re measuring some of your variables to add precision or switch your design to something that will give you a little more power. Or make sure you include some controls that will control some of the random error.  All of these increase power without increasing sample size.

Reason 5: The biggest benefit of doing these calculations is to not waste years and thousands of dollars in grants or tuition pursuing an impossible analysis.

If sample size calculations indicate you need a thousand subjects to find  significant results but time, money, or ethical constraints limit you to 50, don’t do that study.

I know it’s painful to go back to square 1, but it’s much better to do it now than after 3 years of work.

From: The Analysis Factor

How to Calculate Effect Size Statistics

There are many effect size statistics for ANOVA and regression, and as you may have noticed, journal editors are now requiring you include one.

Unfortunately, the one your editor wants or is the one most appropriate to your research may not be the one your software makes available (SPSS, for example, reports Partial Eta Squared only, although it labels it Eta Squared in early versions).

Luckily, all the effect size measures are relatively easy to calculate from information in the ANOVA table on your output.  Here are a few common ones:

Effect Size ForulasEta Squared, Partial Eta Squared, and Omega Squared Formulas

Cohens d formulaCohen’s d formula

You  have to be careful, if you’re using SPSS, to use the correct values, as SPSS labels aren’t always what we think.  For example, for SSTotal, use what SPSS labels SS Corrected Total.

What SPSS labels SS Total actually also includes SS for the Intercept, which is redundant to other information in the model.

This is a nice page that walks you through some of these calcuations using SPSS output:

Measures of Effect Size in SPSS

The denominator for Cohen’s d is always some measure of standard deviation.  I’ve shown s pooled here, but you often see different options, including just using one sample’s s.  This is the one I see used most commonly.

From: The Analysis Factor

Benford’s Law

The probability of any first significant digit n occurring in a dataset is:

From: Amazing Applications of Probability and Statistics

Free Video Courses on R, Structural Equation Modelling, Causal Inference, and Regression from Uni Jena

 

The Department of Methodology and Evaluation Research at Universität Jena has made available a set of free online video courses on data analysis. They cover topics that are particularly relevant to psychology and social science researchers, including SEM, causal inference, regression, R, and psychometrics. Some courses are in German, but many are in English, and the language of the course is clearly marked. Some require that you register, but registration is free. Their website allows you to filter just for English Language courses. Below are some courses that Jeremy Anglin found particularly appealing.

From: R Bloggers

Statistics Resources for Social and Behavioural Sciences: SPSS, R, Maths, and Writing

Here is a newly discovered blog with lots of useful articles about practical statistics.

GENERAL STATISTICS
OTHER

From: Jeromy Anglim’s Blog: Psychology and Statistics

R Tutorial Series

Below is a categorized list of the articles currently offered in the R Tutorial Series.

Introduction to R

Descriptive Statistics

Data Visualization

Correlation

Regression

HLM

ANOVA

From: R bloggers

Chart Suggestions

How to Find the Right Chart Type for your Numeric Data

If you are finding it hard to pick the right chart type for your type of data, this easy flow chart (available as PDF and JPG) courtesy Andrew Abela should help you make the decision quickly. Start from the center and take the route that best matches your data.

Online electronic journals index

eprintweb – Home

eprintweb.org is an e-print service in the fields of physics, mathematics, non-linear science, computer science, and quantitative biology, and consists of e-print records which can be browsed and searched.

The contents of eprintweb.org are provided by arXiv, which is operated and funded by Cornell University Library, a private not-for-profit educational institution, and is also partially funded by the National Science Foundation.

Introduction to Probability

Introduction to Probability

This introductory probability book, published by the American Mathematical Society, is available from AMS bookshop. It has, since publication, also been available for download here in pdf format.

Introduction to Statistical Thought

Introduction to Statistical Thought

This eBook is intended as an upper level undergraduate or introductory graduate textbook in statistical thinking with a likelihood emphasis for students with a good knowledge of calculus and the ability to think abstractly. By “statistical thinking” is meant a focus on ideas that statisticians care about as opposed to technical details of how to put those ideas into practice. The book does contain technical details, but they are not the focus. By “likelihood emphasis” is meant that the likelihood function and likelihood principle are unifying ideas throughout the text. Another unusual aspect is the use of statistical software as a pedagogical tool. That is, instead of viewing the computer merely as a convenient and accurate calculating device, the book uses computer calculation and simulation as another way of explaining and helping readers understand the underlying concepts. The book is written with the statistical language R embedded throughout. R and accompanying manuals are available for free download from http://www.r-project.org.

SPSS Tutorials

Lots of really good tutorials and references for learning SPSS

Free General and Specialized Online SPSS Tutorials

Introduction to Statistical Thought

Here is a good free statistics textbook available in PDF format.

Introduction to Statistical Thought

The book is intended as an upper level undergraduate or introductory graduate textbook in statistical thinking with a likelihood emphasis for students with a good knowledge of calculus and the ability to think abstractly. By “statistical thinking” is meant a focus on ideas that statisticians care about as opposed to technical details of how to put those ideas into practice. The book does contain technical details, but they are not the focus. By “likelihood emphasis” is meant that the likelihood function and likelihood principle are unifying ideas throughout the text.

Another unusual aspect is the use of statistical software as a pedagogical tool. That is, instead of viewing the computer merely as a convenient and accurate calculating device, the book uses computer calculation and simulation as another way of explaining and helping readers understand the underlying concepts. The book is written with the statistical language R embedded throughout. R and accompanying manuals are available for free download from http://www.r-project.org.

Introduction to Statistical Thought is not finished yet, but is sufficiently complete to be used as a course text by knowledgable instructors. Material will be added. Corrections will be made.

Introduction to Probability

Textbook Revolution: Introduction to Probability

A PDF of an in-print probability textbook by authors from Dartmouth and Swarthmore. The website also has links to many other useful probability resources online, including a complete course on the subject. The book has everything you’d expect a print textbook to have, including a professional layout, plenty of illustrations, and practice problems at the end of each chapter. Why use anything else?

http://www.dartmouth.edu/%7Echance/teaching_aids/books_articles/probability_book/book.html

Engineering Statistics Handbook

NIST/SEMATECH e-Handbook of Statistical Methods

The NIST/SEMATECH e-Handbook of Statistical Methods 1, is a Web-based book whose goal is to help scientists and engineers incorporate statistical methods into their work as efficiently as possible. Ideally it will serve as a reference that will help scientists and engineers design their own experiments and carry out the appropriate analyses when a statistician is not available to help. It is also hoped that it will serve as a useful educational tool that will help users of statistical methods and consumers of statistical information better understand statistical procedures and their underlying assumptions and more clearly interpret scientific and engineering results stated in statistical terms.

Statistical Thinking for Managerial Decisions

Dr. Arsham’s Statistics Site

via:

Textbook Revolution: Statistical Thinking for Managerial Decisions

Here’s an entire statistics book on one long page of simple HTML text, with hyperlinks taking you up and down the page. If you print it out, you’ll get a book 172 pages long. The setup makes it easy to search the site using CTRL-F in your browser and minimizes clicking around.

Continuously updated since 1994 and with mirrors around the world, the site is well broken-in and remarkably polished. Although it’s simple text-based setup seems simplistic at first, the site is packed with useful features. There is a Spanish language version, for starters. There are also a couple of dozen Javascript tools to help you along, and an equal number of companion websites on a huge variety of topics ranging from using Excel to designing questionnaires to linear programming site. Perhaps the most useful is the list of hundreds of links to other online statistics resources, each link with a capsule description. This site might be the last statistics resource you ever need.

Free textbook: Statistical Analysis with the General Linear Model

Textbook Revolution: Statistical Analysis with the General Linear Model

Statistical Analysis with the General Linear Model is an introductory textbook describing statistical analysis with ANOVA, regression, and analysis of covariance. It is intended for social sciences students with a minimum of mathematical background, who have had one previous statistics class (covering descriptive statistics and basic hypothesis testing). It includes detailed explanations and many examples, and can serve as the basis for an advanced statistics course.

The R Project for Statistical Computing

Textbook Revolution: The R Project for Statistical Computing

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. If you have more than a passing interest in statistics and data presentation, R provides you with a great set of tools to work with. The ‘environment’ is easily extended with a wide selection of packages for addressing particular disciplines, such as bioinformatics, ecology, finance and more. Custom functions are easily built, and can be used as an accessible introduction to computer programming.

Benford’s Law

Benford’s Law

Benford’s law predicts a decreasing frequency of first digits, from 1 through 9. Every entry in data sets developed by Benford for numbers appearing on the front pages of newspapers, by Mark Nigrini of 3,141 county populations in the 1990 U.S. Census and by Eduardo Ley of the Dow Jones Industrial Average from 1990-93 follows Benford’s law within 2 percent.

SPSS Web books

SPSS Web Books: Regression with SPSS

The aim of these materials is to help you increase your skills in using regression analysis with SPSS. This web book does not teach regression, per se, but focuses on how to perform regression analyses using SPSS. It is assumed that you have had at least a one quarter/semester course in regression (linear models) or a general statistical methods course that covers simple and multiple regression and have access to a regression textbook that explains the theoretical background of the materials covered in these chapters. These materials also assume you are familiar with using SPSS.