[ COVER OF THE WEEK ]
[ LOCAL EVENTS & SESSIONS]
More WEB events? Click Here
[ AnalyticsWeek BYTES]
>> Predicting UX Metrics with the PURE Method by analyticsweek
>> Looking Beyond OAS 3 (Part 1) by analyticsweekpick
>> Big universe, big data, astronomical opportunity by analyticsweekpick
Wanna write? Click Here
[ NEWS BYTES]
Marketing & Social Media Manager – BevNET.com Under Social Analytics
The Finance Leader’s Guide to Balancing Risk and Performance – FEI Daily Under Risk Analytics
The tools keep getting better: McLeod’s applied data science initiative – FreightWaves Under Data Science
More NEWS ? Click Here
[ FEATURED COURSE]
Process Mining: Data science in Action
[ FEATURED READ]
Machine Learning With Random Forests And Decision Trees: A Visual Guide For Beginners
[ TIPS & TRICKS OF THE WEEK]
Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.
[ DATA SCIENCE Q&A]
Q:How do you know if one algorithm is better than other?
A: * In terms of performance on a given data set?
* In terms of performance on several data sets?
* In terms of efficiency?
In terms of performance on several data sets:
- 'Does learning algorithm A have a higher chance of producing a better predictor than learning algorithm B in the given context?”
- 'Bayesian Comparison of Machine Learning Algorithms on Single and Multiple Datasets”, A. Lacoste and F. Laviolette
- 'Statistical Comparisons of Classifiers over Multiple Data Sets”, Janez Demsar
In terms of performance on a given data set:
- One wants to choose between two learning algorithms
- Need to compare their performances and assess the statistical significance
One approach (Not preferred in the literature):
- Multiple k-fold cross validation: run CV multiple times and take the mean and sd
- You have: algorithm A (mean and sd) and algorithm B (mean and sd)
- Is the difference meaningful? (Paired t-test)
Sign-test (classification context):
Simply counts the number of times A has a better metrics than B and assumes this comes from a binomial distribution. Then we can obtain a p-value of the HoHo test: A and B are equal in terms of performance.
Wilcoxon signed rank test (classification context):
Like the sign-test, but the wins (A is better than B) are weighted and assumed coming from a symmetric distribution around a common median. Then, we obtain a p-value of the HoHo test.
Other (without hypothesis testing):
[ WORK WITH TAO]