In addition to directly solving optimization, quadratic unconstrained binary optimization solvers like ours have machine learning applications. The specific application we demonstrate here is one known as boosting. Particularly we are demonstrating a variant of boosting that has been adapted to quadratic solvers known as QBoost. The underlying idea of boosting is to use many sources of imperfect information to build a strong prediction. In machine learning language, we find a collection of weak classifiers that together can form a strong classifier. A weighted combination of these sources of partial information can provide a powerful tool if combined in the right way. The task that the Dirac device will be doing is to find this combination. An obvious constraint is to include the classifiers that give the most accurate information, but there is another concern. We want ones that give complementary information. Statistically speaking, we want to take classifiers that have high correlations with the information that we want to classify, but have weak correlations between them. In the extreme case, two classifiers could give the exact same information, in which case including both is wasteful. However, avoiding waste isn't the only concern here. Including too many classifiers can also lead to overfitting if they capture spurious information specific to the training data, rather than information that will generalize well to unseen cases. In this tutorial, we show an implementation of QBoost and test it on a simple binary classification problem using the IRIS dataset.
Importance
An advantage of boosting is that once the strong classifier is built, it can be applied without having to re-solve the QUBO. As a result, the classifier can be applied in settings where access to Dirac is not available. As Dirac only gets used in the training phase, it also can be reused many times in the future. This simple application provides one example of many potential machine learning applications of our hardware.
Applications
Classification, the task that QBoost performs, appears in a number of settings. A simple example of a classification problem that you are probably impacted by every day is email spam filtering. Here, the goal is to categorize email as "spam" or "legitimate", and it is relatively straightforward to see how the boosting approach can be applied. A variety of weak rules can be derived, (for example, a spam email is probably slightly more likely to contain the word "money"). These are of little use individually, but can be made into a powerful filter when combined through boosting. Disease diagnosis is also fundamentally a classification problem with a concrete example being the use of boosting to predict chronic kidney disease. The weak classifiers would come from patient medical history, such as whether they have other conditions or not, as well as other factors such as age. Also, boosting approaches can be applied to image recognition. This is done by using simple features (for example, a nose between two eyes represented by a lighter rectangle between two darker ones) as weak classifiers, and checking for combinations of them, as was done here for facial recognition.
Methodology
The idea is based on the concept of boosting. Let us assume that we have a collection of N "weak" classifiers hi where i=1,2,...,N. The goal is to construct a "strong" classifier as a linear superposition of these weak classifiers, that is,
y=∑i=1Nwihi(x)
where x is a vector of input features and y∈{−1,1}. The goal is to find wi, weights associated with the weak classifiers.
Let us have a training set {(xs,ys)∣s=1,2,...,S} of size S. We can determine optimal weights wi by minimizing,
where here we assume that wi weights are integers. Each weight can be constructed using D qubits as
wi=∑d=0D−12dxi,d
where xi,d are binary variables. Navin et. al. (https://arxiv.org/abs/0811.0416) reported that using D=1 yields similar or improved generalized errors compared to D>1. The regularization term λ∑i=1N(wi)0 only works when D=1 that is when the weights are binary. The corresponding QUBO is then,
minxxT(Q+P)x
where
Qij=N21∑s=1Shi(xs)hj(xs)
and
Pij=δij(λ−N2∑s=1Shi(xs)ys)
Note that the regularization term is designed to push many weights to zero, so a subset of the weak classifiers are chosen.
In the implementation that follows, we have used decision tree classifiers based on one, two, or three of the features as the weak classifiers.
Data
We halved the IRIS dataset to build a binary classifier using QBoost. The reader can refer to
The above class can then be used to build a classifier using the IRIS dataset. We have used 80% of the data for training and the rest is used for testing.
In [3]:
import sys
from collections import Counter
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
Test precision: 1.0
Test recall: 1.0
Test accuracy: 1.0
Out [ ]:
<Figure size 640x480 with 2 Axes>
Conclusion
In this tutorial, we have demonstrated how to use our Dirac-1 device for a simple machine learning task with many useful applications. The methods presented here are based on the quadratic unconstrained binary optimization formulation. You may also be interested in other machine learning applications, such as dimensionality reduction, which uses the same unconstrained formulation. Alternatively, you may want to read about feature selection, which uses a quadratic linearly constrained binary optimization formulation. Another option is image classification or audio classification, which use a reservoir-based hardware offering, known as EmuCore. Of course, you may also prefer to get started trying to run a problem of your own.