I work as a consultant data scientist - mostly in credit scoring (predictive modelling of retail customer behaviour in consumer finance). Since 1989 I have worked in this area with a variety of vendor and client organisations on project work and R&D of products and services. During that time I have worked on a wide range of topics, including: application and behaviour scoring, collection scoring, profitability modelling, Basel II modelling, fraud detection, and entity resolution (approximate identity matching). I tend to use a range of non-standard modelling techniques (for credit scoring), but always hold pragmatism as more important than technical virtuosity when applied to systems that make millions of automated decisions. In credit scoring and related areas the most important modelling question is “What could possibly go wrong?” applied broadly to cover the technical, operational, and ethical aspects.
I am also an independent researcher in cognitive science, having held adjunct positions at the University of Melbourne and La Trobe University. This research revolves around Vector Symbolic Architectures - computational systems based on very high dimensional dynamic systems, which can be implemented as neural networks and can be thought of as analog computers for manipulating discrete data structures such as trees and graphs. This research is aimed at developing a practical, implementable, connectionist architecture for compositional memory. Such a memory system would have the ability to recognise novel situations and objects in terms of the novel pattern of structural relationships between their familiar component parts. This work effectively treats analogy as a primitive capability of memory. Current standard machine-learning techniques have limited capacity to deal with patterns of relationships and consequently have difficulty recognising novel configurations of familiar components or recognising familiar patterns of relationship when the components have been changed. If successful, this work will have fundamental implications for cognitive science.
If you want a more CV-like listing of where I have worked, look at my LinkedIn profile.
I am progressively transferring content from my old website to here. If you can’t find something here, it may be there. My old website will eventually be retired.
PhD in Psychology, 1988
University of Queensland
BSc (Hons) 1 in Psychology, 1978
University of Queensland
BSc in Psychology & Computer Science, 1977
University of Queensland
Examples of what I do.
Score calibration is the process of empirically determining the relationship between a score and an outcome on some population of interest, and scaling is the process of expressing that relationship in agreed units. Calibration is often treated as a simple matter and attacked with simple tools – typically, either assuming the relationship between score and log-odds is linear and fitting a logistic regression with the score as the only covariate, or dividing the score range into bands and plotting the empirical log-odds as a function of score band.
Both approaches ignore some information in the data. The assumption of a linear score to log-odds relationship is too restrictive and score banding ignores the continuity of the scores. While a linear score to log-odds relationship is often an adequate approximation, the reality can be much more interesting, with noticeable deviations from the linear trend. These deviations include large-scale non-linearity, small-scale non-monotonicity, discrete discontinuities, and complete breakdown of the linear trend at extreme scores.
Detecting these effects requires a more sophisticated approach to empirically determining the score to outcome relationship. Taking a more sophisticated approach can be surprisingly tricky: the typically strong linear trend can obscure smaller deviations from linearity; detecting subtle trends requires exploiting the continuity of the scores, which can obscure discrete deviations; trends at extreme scores (out in the data-sparse tails of the distribution of scores) can be obscured by trends at less extreme scores (where there is more data); score distributions with some specific values that are relatively common can disrupt methods relying on continuity; and any modelling technique can introduce its own biases.
Over the years I have developed a personal approach to these issues in score calibration and implemented them as an open source, publicly accessible R package for score calibration. I discuss these technical issues in empirical score calibration and show how they are addressed in the scorecal package.
The ROC curve is useful for assessing the predictive power of risk models and is relatively well known for this purpose in the credit scoring community. The ROC curve is a component of the Theory of Signal Detection (TSD), a theory which has pervasive links to many issues in model building. However, these conceptual links and their associated insights and techniques are less well known than they deserve to be among credit scoring practitioners.
The purpose of this paper is to alert credit risk modelers to the relationships between TSD and common scorecard development concepts and to provide a toolbox of simple techniques and interpretations.
An overview of my approach to compositional memory.
Place-holder for any work related to credit scoring that is not allocated to a more specific project.
Development of an R package for score calibration.
There are no face-to-face meetings in the foreseeable future because of the COVID-19 pandemic. If you would like to chat with me, drop me a note and we’ll set up an online meeting.
VSA workshop 2020 [SPEAKER]
First Workshop on Developments in Hyperdimensional Computing and Vector Symbolic Architectures
16 March 2020
Heidelberg, Germany
The workshop was cancelled because of the COVID-19 pandemic,
but my short presentation “VSA, Analogy, and Dynamic Similarity”
is online at https://doi.org/10.5281/zenodo.3700836
and the source code of the presentation is at https://github.com/rgayler/VSA_2020_presentation
WOMBAT2019 workshop
Workshop Organised by the Monash Business Analytics Team 2019
Statistical methods and tools for effective analysis of high-dimensional data.
28 - 29 November 2019
Melbourne, Australia
AIMOS 2019 conference
Australian Interdisciplinary Meta-Research and Open Science 2019
7 - 8 November 2019
Melbourne, Australia
Ethics in Artificial Intelligence
5:30 - 7:30pm, 15 October 2019
Melbourne, Australia
Melbourne Users of R Network (MelbURN) Meetup
Exploring the individuals in longitudinal data with the brolgar package (Nick Tierney)
Deploying R models in AWS (Jeno Yamma)
5:45 - 8:00pm, 7 October 2019
Melbourne, Australia
R-Ladies Melbourne Meetup
Take a Sad Plot & Make It Better (Alison Hill)
6:15 - 8:15pm, 26 September 2019
Melbourne, Australia
R-Ladies Melbourne Meetup
Gold star reproducibility: containerisation with open-source tools (Saras Windecker)
5:30 - 7:30pm, 18 September 2019
Melbourne, Australia
Data Science Melbourne Meetup
Conversational AI (Prashant Natarajan)
Making Money in Data Science (Nic Ryan)
5:30 - 8:30pm, 12 September 2019
Melbourne, Australia
CSCC XVI
[SPEAKER]
Credit Scoring and Credit Control conference 2019
27 - 30 August 2019
Edinburgh, Scotland
Conference Paper Archive
My presentation on credit score calibration is online at http://doi.org/10.5281/zenodo.3381658
and the R notebook that generated it is at http://doi.org/10.5281/zenodo.3381641
RSSDS 2019
Research School on Statistics and Data Science 2019
24 - 26 July 2019
Melbourne, Australia
R-Ladies Melbourne Meetup
Baby one more time - reproducibility in R and when to bring in the big guns (Lavinia Gordon)
5:30 - 7:30pm, 22 May 2019
Melbourne, Australia
Statistical Society of Australia (Vic.) Meetup
Reproducibility and Open Science (Hannah Fraser; Fiona Fidler; Mathew Ling)
5:45 - 7:15pm, 30 April 2019
Melbourne, Australia
MeDaScIn 2018
Melbourne Data Science Initiative conference
26 September 2018
Melbourne, Australia
Melbourne NLP Meetup
Robust NLP (Tim Baldwin)
Biomedical text mining (Antonio Jimeno Yepes)
6:00 - 9:00pm, 20 September 2018
Melbourne, Australia
R-Ladies Melbourne Meetup
R as a tool for complex systems modelling (Caitlin Adams)
5:30 - 8:00pm, 19 September 2018
Melbourne, Australia
Machine Learning & AI Meetup
Causality (Elizabeth Silver)
6:00 - 9:00pm, 18 September 2018
Melbourne, Australia
Melbourne Stan and Bayesian Inference Meetup
Example models in Stan (Martin Ingram)
6:00 - 7:30pm, 30 August 2018
Melbourne, Australia
R-Ladies Melbourne Meetup
Getting down and up with blogging in R (Emi Tanaka)
5:30 - 8:00pm, 28 August 2018
Melbourne, Australia
Melbourne Users of R Network (MelbURN) Meetup
Production ready R - getting started with R and Docker (Elizabeth Stark)
5:45 - 7:30pm, 23 August 2018
Melbourne, Australia
Machine Learning & AI Meetup [SPEAKER]
VSA: Analog computing for discrete data structures (Ross Gayler)
6:00 - 9:00pm, 21 August 2018
Melbourne, Australia
Big data, privacy and AI
(David Watts, Mira Stammers, Bridget Bainbridge)
6:00 - 7:30pm, 31 July 2018
Melbourne, Australia
Machine Learning & AI Meetup
Special event with Richard Socher & AirTree Ventures
6:00 - 8:00pm, 18 July 2018
Melbourne, Australia
Machine Learning & AI Meetup
Matt Gardner - Allen Institute for Artificial Intelligence
6:00 - 9:00pm, 17 July 2018
Melbourne, Australia
useR! 2018
The conference for users of R
10 - 13 July 2018
Brisbane, Australia
Statistical Society of Australia (Vic.) Meetup
Credit scoring: should greater predictability come at the cost of model interpretation? (Ed Stokes)
5:45 - 7:15pm, 29 May 2018
Melbourne, Australia
Melbourne Users of R Network (MelbURN) Meetup
R and Data Management
6:00 - 8:30pm, 23 May 2018
Melbourne, Australia
Data Science Melbourne Meetup
Agile Data Science 2.0! (Vaenthan Thiru, Eric Wei, Felipe Flores)
5:30 - 8:00pm, 17 May 2018
Melbourne, Australia
Machine Learning & AI Meetup
Quantum machine learning (Chris Watkins)
6:00 - 9:00pm, 15 May 2018
Melbourne, Australia
Melbourne Users of R Network (MelbURN) Meetup
greta: simple and scalable statistical modelling in R (Nick Golding)
5:45 - 9:00pm, 19 April 2018
Melbourne, Australia
Melbourne Users of R Network (MelbURN) Meetup
rOpenSci ozunconf: Building communities to transform science (Nick Tierney)
5:45 - 8:00pm, 19 March 2018
Melbourne, Australia
Statistical Society of Australia (Vic.) Meetup
Assessing health impacts of environmental mixtures (Roger Peng)
6:15 - 7:15pm, 21 November 2017
Melbourne, Australia
Data Science Melbourne Meetup
Building text based data products in the real world & Smart buildings (Kukas Toma, Cameron Roach)
5:15 - 8:15pm, 9 November 2017
Melbourne, Australia
clj-melb Meetup
Experiences developing a full mobile app in 3 Weeks using ClojureScript and ReactNative (Chad Harris)
6:30 - 9:30pm, 9 November 2017
Melbourne, Australia
rOpenSci OzUnconf 2017
OpenSci OzUnconference
26 - 27 October 2017
Melbourne, Australia
Machine Learning & AI Meetup
Lightning talks (Angus Russell, Andy Gelme, Alisha Aneja)
6:00 - 9:00pm, 17 October 2017
Melbourne, Australia
Melbourne Users of R Network (MelbURN) Meetup
Analysing sub-daily time series data (Rob Hyndman, Earo Wang, Mitchell O’Hara-Wild)
5:45 - 8:45pm, 12 October 2017
Melbourne, Australia
Data Science Melbourne Meetup
Lunchtime tutorial - h2o (James Pearce)
12:00 - 3:00pm, 11 October 2017
Melbourne, Australia
Melbourne Users of R Network (MelbURN) Meetup
Getting started in Bayesian modelling with STAN and RStan (Bill Dixon)
5:45 - 8:15pm, 13 September 2017
Melbourne, Australia
CSCC XV [SPEAKER]
Credit Scoring and Credit Control conference 2017
30 August - 1 September 2017
Edinburgh, Scotland
Conference Paper Archive
2018-11 — Text interview with John Flackett of AiLab. This interview mostly focuses on Artificial Intelligence. https://www.ailab.com.au/interviews/dr-ross-gayler/
2015-09 — Video interview with Kevin Korb of Monash University. The interview was recorded for data science students at Monash and focuses on credit scoring as an application of data science. https://youtu.be/2txQObUzarM
I don’t work as an academic, so I don’t have career incentives for traditional publications. Consequently, my outputs are in whatever format was most convenient for me at the time. Most of my conference presentations are exactly that, presentations with no accompanying paper. My traditional format publications tend to mostly arise from collaborations with academic colleagues.
I have not yet transferred all the outputs from my old, outdated website. Until I do, the best sources are:
All content, unless explicitly noted otherwise, is licensed under a
Creative Commons Attribution 4.0 International License
.
All the following points should be read as “to the best of my knowledge”. I am not a website expert, so I can’t vouch for how this website is actually implemented. I can only tell you about my intentions.
Nothing on this website requires you to identify yourself. The only personal information collected while you visit this site is non-identifying information, such as browser type and operating system. This information is collected by Google Analytics for measuring visitor traffic to this site.
I do not collect this information and have no access to it other than as aggregated reports. Here is the Google Analytics privacy page.
This information is collected via cookies. Most web browsers allow you to control handling of cookies. To the best of my knowledge, you can disable all cookies for this website without in any way reducing the functionality for you.
I have set the Hugo GDPR options so that your IP address is anonymised within Google Analytics and the “Do Not Track” request is respected.
I don’t collect your personal information, so there is nothing I can share.
Google Analytics does collect some information about you. See the Google Analytics privacy page.
For each visitor to reach the site, Google Analytics collects the following non-personally identifiable information, including but not limited to browser type, version and language, operating system, pages viewed while browsing the site, page access times and referring website address. This information is presented to me as aggregated reports for the purpose of gauging visitor traffic and trends.