Selected Outputs

Examples of what I do.

This correspondence comments on the findings reported in a recent Science Robotics article by Mitrokhin et al. [1]. The main goal of this commentary is to expand on some of the issues touched on in that article. Our experience is that hyperdimensional computing is very different from other approaches to computation and that it can take considerable exposure to its concepts before attaining practically useful understanding. Therefore, in order to provide an overview of the area to the first time reader of [1], the commentary includes a brief historic overview as well as connects the findings of the article to a larger body of literature existing in the area.
arXiv arXiv:2003.11458 [cs.RO], 2020

It has been argued that analogy is at the core of cognition [7, 1]. My work in VSA is driven by the goal of building a practical, effective analogical memory/reasoning system. Analogy is commonly construed as structure mapping between a source and target [5], which in turn can be construed as representing the source and target as graphs and finding maximal graph isomorphisms between them. This can also be viewed as a kind of dynamic similarity in that the initially dissimilar source and target are effectively very similar after mapping.

Similarity (the angle between vectors) is central to the mechanics of VSA/HDC. Introductory papers (e.g. [8]) necessarily devote space to vector similarityand the effect of the primitive operators (sum, product, permutation) on similarity. Most VSA examples rely on static similarity, where the vector representations are fixed over the time scale of the core computation (which is usually a single-pass, feed-forward computation). This emphasises encoding methods (e.g. [12, 13]) that create vector representations with the similarity structure required by the core computation. Random Indexing [13] is an instance of the vector embedding approach to representation [11] that is widely used in NLP and ML. The important point is that the vector embeddings are developed in advance and then used as static representations (with fixed similarity structure) in the subsequent computation of interest.

Human similarity judgments are known to be context-dependent (see [3] for a brief review). It has also been argued that similarity and analogy are based on the same processes [6] and that cognition is so thoroughly context-dependent that representations are created on-the-fly in response to task demands [2]. This seems extreme, but doesn’t necessarily imply that the base representations are context-dependent as long as the cognitive process that compares them is context-dependent, which can be achieved by having dynamic representations that are derived from the static base representations by context-dependent transforms (or any functionally equivalent process).

An obvious candidate for a dynamic transformation function in VSA is substitution by binding, because the substitution can be specified as a vector and dynamically generated (see Representing substitution with a computed mapping in [8]). This implies an internal degree of freedom (a register to hold the substitution vector while it evolves) and a recurrent VSA circuit to provide the dynamics to evolve the substitution vector.

These essential aspects are present in [4], which finds the maximal subgraph isomorphism between two graphs represented as vectors. This is implemented as a recurrent VSA circuit with a register containing a substitution vector that evolves and settles over the course of the computation. The final state of the substitution vector represents the set of substitutions that transforms the static base representation of each graph into the best subgraph isomorphism to the static base representation of the other graph. This is a useful step along the path to an analogical memory system.

Interestingly, the subgraph isomorphism circuit can be interpreted as related to the recently developed Resonator Circuits for factorisation of VSA representations [9], which have internal degrees of freedom for each of the factors to be calculated and a recurrent VSA dynamics that settles on the factorisation. The graph isomorphism circuit can be interpreted as finding a factor (the substitution vector) such that the product of that factor with each of the graphs is the best possible approximation to the other graph. This links the whole enterprise back to statistical modelling, where there is a long history of approximating matrices/tensors as the product of simpler factors [10].

References
1. Blokpoel, M., Wareham, T., Haselager, P., van Rooij, I.: Deep Analogical Inference as the Origin of Hypotheses. The Journal of Problem Solving 11(1), 1–24 (2018)
2. Chalmers, D.J., French, R.M., Hofstadter, D.R.: High-level perception, representation, and analogy: A critique of artificial intelligence methodology. Journal of Experimental & Theoretical Artificial Intelligence 4(3), 185–211 (1992)
3. Cheng, Y.: Context-dependent similarity. In: Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence (UAI’90), pp. 27–30. Cambridge, MA, USA (1990)
4. Gayler, R.W., Levy, S.D.: A distributed basis for analogical mapping. In: Proceedings of the Second International Conference on Analogy (ANALOGY-2009), pp. 165–174. New Bulgarian University, Sofia, Bulgaria (2009)
5. Gentner, D.: Structure-mapping: A theoretical framework for analogy. Cognitive Science 7(2), 155–170 (1983)
6. Gentner, D., Markman, A.B.: Structure mapping in analogy and similarity. American Psychologist 52(1), 45–56 (1997)
7. Gust, H., Krumnack, U., Kühnberger, K.-U., Schwering, A.: Analogical Reasoning: A core of cognition. KI - Künstliche Intelligenz 1(8), 8–12 (2008)
8. Kanerva, P.: Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognitive Computation 1, 139–159 (2009)
9. Kent, S.J., Frady, E.P., Sommer, F.T., Olshausen, B.A.: Resonator Circuits for factoring high-dimensional vectors. http://arxiv.org/abs/1906.11684 (2019)
10. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Review 51(3), 455–500 (2009)
11. Pennington, J., Socher, R., Manning, C.D.: GloVe: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Computational Linguistics, Doha, Qatar (2014)
12. Purdy, S.: Encoding data for HTM systems. http://arxiv.org/abs/1602.05925 (2016)
13. Sahlgren, M.: An introduction to random indexing. In: Proceedings of the Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering (TKE 2005), Copenhagen, Denmark (2005)
Presentation to be given at the Workshop on Developments in Hyperdimensional Computing and Vector Symbolic Architectures in Heidelberg, Germany (cancelled because of the COVID-19 pandemic), 2020

We argue that Vector Symbolic Architectures, as an instance of a connectionist approach, provides Wittgenstein-friendly and Wittgenstein-illuminating models of mind and language for cognitive science
Philosophia doi:10.1007/s11406-019-00154-9, 2020

Score calibration is the process of empirically determining the relationship between a score and an outcome on some population of interest, and scaling is the process of expressing that relationship in agreed units. Calibration is often treated as a simple matter and attacked with simple tools – typically, either assuming the relationship between score and log-odds is linear and fitting a logistic regression with the score as the only covariate, or dividing the score range into bands and plotting the empirical log-odds as a function of score band.

Both approaches ignore some information in the data. The assumption of a linear score to log-odds relationship is too restrictive and score banding ignores the continuity of the scores. While a linear score to log-odds relationship is often an adequate approximation, the reality can be much more interesting, with noticeable deviations from the linear trend. These deviations include large-scale non-linearity, small-scale non-monotonicity, discrete discontinuities, and complete breakdown of the linear trend at extreme scores.

Detecting these effects requires a more sophisticated approach to empirically determining the score to outcome relationship. Taking a more sophisticated approach can be surprisingly tricky: the typically strong linear trend can obscure smaller deviations from linearity; detecting subtle trends requires exploiting the continuity of the scores, which can obscure discrete deviations; trends at extreme scores (out in the data-sparse tails of the distribution of scores) can be obscured by trends at less extreme scores (where there is more data); score distributions with some specific values that are relatively common can disrupt methods relying on continuity; and any modelling technique can introduce its own biases.

Over the years I have developed a personal approach to these issues in score calibration and implemented them as an open source, publicly accessible R package for score calibration. I discuss these technical issues in empirical score calibration and show how they are addressed in the scorecal package.

CSCC 2019, 2019

In this talk I describe Vector Symbolic Architectures, a family of mathematical techniques for analog computation in hyperdimensional vector spaces that map naturally onto neural network implementations. VSAs naturally support computation on discrete compositional data structures and a form of virtualisation that breaks the nexus between the items to be represented and the hardware that supports the representation. This means that computations on evolving data structures do not require physical rewiring of the implementing hardware. I illustrate this approach with a VSA system that finds isomorphisms between graphs and where different problems to be solved are represented by different initial states of the fixed hardware rather than by rewiring the hardware.
Redwood Center, UC Berkeley, 2013

Jackendoff (2002) posed four challenges that linguistic combinatoriality and rules of language present to theories of brain function. The essence of these problems is the question of how to neurally instantiate the rapid construction and transformation of the compositional structures that are typically taken to be the domain of symbolic processing. He contended that typical connectionist approaches fail to meet these challenges and that the dialogue between linguistic theory and cognitive neuroscience will be relatively unproductive until the importance of these problems is widely recognised and the challenges answered by some technical innovation in connectionist modelling. This paper claims that a little-known family of connectionist models (Vector Symbolic Architectures) are able to meet Jackendoff’s challenges.
Proceedings of the ICCS/ASCS Joint International Conference on Cognitive Science (ICCS/ASCS 2003), 2003

The ROC curve is useful for assessing the predictive power of risk models and is relatively well known for this purpose in the credit scoring community. The ROC curve is a component of the Theory of Signal Detection (TSD), a theory which has pervasive links to many issues in model building. However, these conceptual links and their associated insights and techniques are less well known than they deserve to be among credit scoring practitioners.

The purpose of this paper is to alert credit risk modelers to the relationships between TSD and common scorecard development concepts and to provide a toolbox of simple techniques and interpretations.

Presentation at Credit Scoring and Credit Control VI, Edinburgh, Scotland, 1999

Projects

Compositional Memory

An overview of my approach to compositional memory.

Credit Scoring

Place-holder for any work related to credit scoring that is not allocated to a more specific project.

Score Calibration

Development of an R package for score calibration.

Recent Posts

One of the side-effects of working in credit-scoring, and especially working for a credit bureau, is that you tend to become very concerned about privacy issues. This website is implemented in blogdown using the Hugo static website generator and the Academic theme. When I launched the website, just before the EU General Data Protection Regulation (GDPR) came into force, the implementation tools provided some basic GDPR support. Since then, some more privacy-related features have been added - so I feel obliged to add them to my site.

CONTINUE READING

I created my first personal website in 2006 using Google Sites. This served me well enough for relatively static content but didn’t really suit me for more involved content like blog posts of technical analyses. Eventually I got rather behind in updating it, so now that site is quite out of date. Web technology has moved on since 2006. In particular, the R blogdown package has been released to simplify construction of websites, especially technical blogging based on R analyses.

CONTINUE READING

Events

Future Events

There are no face-to-face meetings in the foreseeable future because of the COVID-19 pandemic. If you would like to chat with me, drop me a note and we’ll set up an online meeting.

Past Events

Recent & Upcoming Talks

VSA, Analogy, and Dynamic Similarity
2020-03-16
scorecal - Empirical score calibration under the microscope
2019-08-30

Interviews

Outputs

I don’t work as an academic, so I don’t have career incentives for traditional publications. Consequently, my outputs are in whatever format was most convenient for me at the time. Most of my conference presentations are exactly that, presentations with no accompanying paper. My traditional format publications tend to mostly arise from collaborations with academic colleagues.

I have not yet transferred all the outputs from my old, outdated website. Until I do, the best sources are:

(2020). VSA, Analogy, and Dynamic Similarity. Presentation to be given at the Workshop on Developments in Hyperdimensional Computing and Vector Symbolic Architectures in Heidelberg, Germany (cancelled because of the COVID-19 pandemic).

Project Slides GitHub repo

(2020). Connecting Twenty-First Century Connectionism and Wittgenstein. Philosophia doi:10.1007/s11406-019-00154-9.

PDF Project

(2003). Vector Symbolic Architectures answer Jackendoff's challenges for cognitive neuroscience. Proceedings of the ICCS/ASCS Joint International Conference on Cognitive Science (ICCS/ASCS 2003).

PDF Project

(1999). Signal Detection for Credit Scoring Practitioners. Presentation at Credit Scoring and Credit Control VI, Edinburgh, Scotland.

Project Slides

Contact

Creative Commons Licence
All content, unless explicitly noted otherwise, is licensed under a Creative Commons Attribution 4.0 International License .

Privacy Policy

All the following points should be read as “to the best of my knowledge”. I am not a website expert, so I can’t vouch for how this website is actually implemented. I can only tell you about my intentions.

Collection of your personal information

Nothing on this website requires you to identify yourself. The only personal information collected while you visit this site is non-identifying information, such as browser type and operating system. This information is collected by Google Analytics for measuring visitor traffic to this site.

I do not collect this information and have no access to it other than as aggregated reports. Here is the Google Analytics privacy page.

This information is collected via cookies. Most web browsers allow you to control handling of cookies. To the best of my knowledge, you can disable all cookies for this website without in any way reducing the functionality for you.

I have set the Hugo GDPR options so that your IP address is anonymised within Google Analytics and the “Do Not Track” request is respected.

Sharing of your personal information

I don’t collect your personal information, so there is nothing I can share.

Google Analytics does collect some information about you. See the Google Analytics privacy page.

Use of your personal information

For each visitor to reach the site, Google Analytics collects the following non-personally identifiable information, including but not limited to browser type, version and language, operating system, pages viewed while browsing the site, page access times and referring website address. This information is presented to me as aggregated reports for the purpose of gauging visitor traffic and trends.