Statistical Analysis of Distance Estimators with Density Differences and Density Ratios
: Estimating a discrepancy between two probability distributions from samples is an important task in statistics and machine learning. There are mainly two classes of discrepancy measures: distance measures based on the density difference, such as the Lp-distances, and divergence measures based on the density ratio, such as the ϕ-divergences. The intersection of these two classes is the L1-distance measure, and thus, it can be estimated either based on the density difference or the density ratio. In this paper, we first show that the Bregman scores, which are widely employed for the estimation of probability densities in statistical data analysis, allows us to estimate the density difference and the density ratio directly without separately estimating each probability distribution. We then theoretically elucidate the robustness of these estimators and present numerical experiments.1. Introduction
In statistics and machine learning, estimating a discrepancy between two probability distributions from samples has been extensively studied [1], because discrepancy estimation is useful in solving various real-world data analysis tasks, including covariate shift adaptation [2,3], conditional probability estimation [4], outlier detection [5] and divergence-based two-sample testing [6].
There are mainly two classes of discrepancy measures for probability densities. One is genuine distances on function spaces, such as the Ls-distance for s ≥ 1, and the other is divergence measures, such as the Kullback–Leibler divergence and the Pearson divergence. Typically, distance measures in the former class can be represented using the difference of two probability densities, while those in the later class are represented using the ratio of two probability densities. Therefore, it is important to establish statistical methods to estimate the density difference and the density ratio.
A naive way to estimate the density difference and the density ratio consists of two steps: two probability densities are separately estimated in the first step, and then, their difference or ratio is computed in the second step. However, such a two-step approach is not favorable in practice, because the density estimation in the first step is carried out without regard to the second step of taking the difference or ratio. To overcome this problem, the authors in [7–11] studied the estimation of the density difference and the density ratio in a semi-parametric manner without separately modeling each probability distribution.
The intersection of the density difference-based distances and the density ratio-based divergences is the L1-distance, and thus, it can be estimated either based on the density difference or the density ratio. In this paper, we first propose a novel direct method to estimate the density difference and the density ratio based on the Bregman scores [12]. We then show that the density-difference approach to L1-distance estimation is more robust than the density-ratio approach. This fact has already been pointed out in [10] based on a somewhat intuitive argument: the density difference is always bounded, while the density ratio can be unbounded. In this paper, we theoretically support this claim by providing detailed theoretical analysis of the robustness properties.
There are some related works to our study. Density ratio estimation was intensively investigated in the machine learning community [4,7,8]. As shown in [6], the density-ratio is available to estimate the ϕ-divergence [13,14]. However, estimation of the L1-distance, which is a member of the ϕ-divergence, was not studied, since it does not satisfy the regularity condition, which is required to investigate the statistical asymptotic property. On the other hand, the least mean squares estimator of the density-difference was proposed in [10], and the robustness property was numerically investigated. In the present paper, not only the least squares estimator, but also general score estimators for density differences are considered, and their robustness properties are theoretically investigated.
The rest of the paper is structured as follows. In Section 2, we describe two approaches to L1-distance estimation based on the density difference and the density ratio. In Section 3, we introduce the Bregman scores, which are widely employed for the estimation of probability densities in statistical data analysis. In Section 4, we apply the Bregman score to the estimation of the density difference and the density ratio. In Section 5, we introduce a robustness measure in terms of which the proposed estimators are analyzed in the following sections. In Section 6, we consider statistical models without the scale parameter (called the non-scale models) and investigate the robustness of the density difference and density ratio estimators. In Section 7, we consider statistical models with the scale parameter (called the scale models) and show that the estimation using the scale models is reduced to the estimation using the non-scale models. Then, we apply the theoretical results on the non-scale models to the scale models and elucidate the robustness of the scale models. In Section 8, numerical examples on L1-distance estimation are presented. Finally, we conclude in Section 9.
2. Estimation of L1-Distance
Let p(x) and q(x) be two probability densities. In this section, we introduce two approaches to estimating discrepancy measures: an approach based on the density difference, p – q, and an approach based on the density ratio, p/q.
2.1. L1-Distance As the Density Difference and Density Ratio
The density difference, p – q, is directly used to compute the Ls-distance between two probability densities:
The purpose of our work is to compare the statistical properties of the density-difference approach and the density-ratio approach to the estimation of the L1-distance between probability densities p and q defined on ℝd. For the estimation of the L1-distance, we use two sets of identically and independently distributed (i.i.d.) samples:
2.2. Density-Difference Approach
The difference of two probability densities, f(x) = p(x) – q(x), is widely applied to statistical inference [10]. A parametric statistical model for the density difference f(x) is denoted as:
Recently, a density-difference estimator from Samples (2) that does not involve separate estimation of two probability densities has been proposed [10]. Once a density-difference estimator, f̂ ∈ diff, is obtained, the L1-distance can be immediately estimated as:
The L1-distance has the invariance property under the variable change. More specifically, let x = ψ (z) be a one-to-one mapping on ℝd and fψ(z) be f(ψ(z))|Jψ(z)|, where Jψ is the Jacobian determinant of ψ. For f(x) = p(x) – q(x), the function, fψ(z), is the density difference between p and q in the z-coordinate. Then, we have:
Note that this invariance property does not hold for general distance measures. Indeed, we have:
2.3. Density-Ratio Approach
The density ratio of two probability densities, p(x) and q(x), is defined as r(x) = p(x)/q(x), which is widely applied in statistical inference as the density difference [4]. Let:
3. Bregman Scores
The Bregman score is an extension of the log-likelihood function, and it is widely applied in statistical inference [12,15–18]. In this section, we briefly review the Bregman score. See [12] for details.
For functions f and g on ℝd, the Bregman score, S(f, g), is a class of real-valued functions that satisfy the inequality:
Let us introduce the definition of Bregman scores. For a function, f, defined on the Euclidean space, ℝd, let G(f) be a real-valued convex functional. The functional, G(f), is called the potential below. The functional derivative of G(f) is denoted as G′(x; f), which is defined as the function satisfying the equality:
When G(f) is expressed as:
If f is a probability density, the Bregman score is expressed as:
Below, let us introduce exemplary Bregman scores:
Example 1 (Kullback–Leibler (KL) score). The Kullback–Leibler (KL) score for the probability densities, p(x) and q(x), is defined as:
Example 2 (Density-power score). Let α be a positive number and f and g be functions that can take both positive and negative values. Then, the density-power score with the base measure λ(·) is defined as:
Example 3 (Pseudo-spherical score; γ-score). For α > 0 and g ≠ 0, the pseudo-spherical score [16] is defined as:
When the model, f(x; θ), includes the scale parameter, c, i.e., f(x; θ) = cg(x; θ̄) with the parameter θ = (c, θ̄) ∈ Θk for c ∈ ℝ and θ̄ ∈ Θk−1, the pseudo-spherical score does not work. This is because the potential is not strictly convex on the statistical model with the scale parameter, and hence, the scale parameter, c, is not estimable when the pseudo-spherical score is used.
The density-power score and the pseudo-spherical score in the above examples include the non-negative parameter, α. When α is an odd integer, the absolute-value operator in the scores can be removed, which is computationally advantageous. For this reason, we set the parameter, α, to a positive odd integer when the Bregman score is used for the estimation of the density difference.
4. Direct Estimation of Density Differences and Density Ratios Using Bregman Scores
The Bregman scores are applicable not only to the estimation of probability densities, but also to the estimation of density differences and density ratios. In this section, we propose estimators for density differences and density ratios, and show their theoretical properties.
4.1. Estimators for Density Differences and Density Ratios
First of all, let us introduce a way to directly estimate the density difference based on the Bregman scores. Let diff be the statistical Model (3) to estimate the true density-difference f(x) = p(x) – q(x) defined on the Euclidean space, ℝd. Let the base measure, λ(·), be the Lebesgue measure. Then, for the density-difference model, fθ ∈ diff, the Bregman score Equation (8) is given as:
Next, we use the Bregman scores to estimate the density ratio r(x) = p(x)/q(x). Let us define q(x) as the base measure of the Bregman score. Given the density-ratio model, ratio, of Equation (5), the Bregman score of the model, rθ ∈ ratio, is given as:
4.2. Invariance of Estimators
We show that the estimators obtained by the density-power score and the pseudo-spherical score have the affine invariance property. Suppose that the Samples (2) are distributed on the d-dimensional Euclidean space, and let us consider the affine transformation of samples, such that xi = Ax′i + b and yj = Ay′j + b, where A is an invertible matrix and b is a vector. Let fA,b(x) be the transformed density-difference |det A|f(Ax′ + b) and f̃A,b be the difference of the empirical distributions defined from the samples, x′i, y′j. Let Sdiff (f, g) be the density-power score or the pseudo-spherical score with a positive odd integer, α, for the estimation of the density difference. Then, we have:
5. Robustness Measure
The robustness of the estimator is an important feature in practice, since typically real-world data includes outliers that may undermine the reliability of the estimator. In this section, we introduce robustness measures of estimators against outliers.
In order to define robustness measures, let us briefly introduce the influence function in the setup of the density-difference estimation. Let p(x) and q(x) be the true probability densities of each dataset in Samples (2). Suppose that these probabilities are shifted to:
The influence function provides several robustness measures of estimators. An example is the gross error sensitivity defined as supzp,zq ‖IFdiff (θ*; zp, zq)‖, where ‖ · ‖ is the Euclidean norm. The estimator that uniformly minimizes the gross error sensitivity over the parameter, θ, is called the most B(bias)-robust estimator. The most B-robust estimator minimizes the worst-case influence of outliers. For the one-dimensional normal distribution, the median estimator is the most B-robust for the estimation of the mean parameter [24].
In this paper, we consider another robustness measure, called the redescending property. The estimator satisfying the following redescending property,
In the next sections, we apply the density-power score and the pseudo-spherical score to estimate the density difference or the density ratio and investigate their robustness.
6. Robustness under Non-Scale Models
In this section, we consider statistical models without the scaling parameter, and investigate the robustness of the density-difference and density-ratio estimators based on the density-power score and the pseudo-spherical score.
6.1. Non-Scale Models
The model satisfying the following assumption is called the non-scale model:
Assumption 1. Let be the model of density differences or density ratios. For c ∈ ℝ and f ∈ , such that c ≠ 0 and f ≠ 0, cf ∈ holds only when c = 1.
The density-power score and the pseudo-spherical score are the strict Bregman score on the non-scale models. Indeed, the density-power score is the strict Bregman score, as pointed out in Example 2. For the pseudo-spherical score, suppose that the equality S(f, g) = S(f, f) holds for the non-zero functions, f and g. Then, g is proportional to f. When f and g are both included in a non-scale model, we have f = g. Thus, the pseudo-spherical score on the non-scale model is also the strict Bregman score.
6.2. Density-Difference Approach
Here, we consider the robustness of density-difference estimation using the non-scale models. Assumption 1 implies that the model, fθ(x), does not include the scale parameter. An example is the model consisting of two probability models,
The following theorem shows the robustness of the density-difference estimator. The proof is found in Appendix A.
Theorem 1. Suppose that Assumption 1 holds for the density-difference model, diff. We assume that the true density-difference, f, is included in diff and that holds. For the Bregman score, Sdiff (f, g), of the density difference, let J be the matrix, each element of which is given as:
Theorem 1 implies that the density-difference estimation with the pseudo-spherical score under non-scale models has the redescending property. For the density difference f = p – q, the limiting condition,
Let us consider the L1-distance estimation using the density-difference estimator. The L1-distance estimator under the Contamination (11) is distributed around:
6.3. Density-Ratio Approach
The following theorem provides the influence function of the density-ratio estimators. Since the proof is almost the same as that of Theorem 1, we omit the detailed calculation.
Theorem 2. Suppose that Assumption 1 holds for the density-ratio model, ratio. We assume that the true density-ratio r(x) = p(x)/q(x) is included in:
The density ratio is a non-negative function. Hence, we do not need to care about the absolute value in the density-power score and the pseudo-spherical score. As a result, the parameter, α, in these scores is allowed to take any positive real number in the above theorem.
For the density ratio r(x) = p(x)/q(x), a typical limiting condition is:
Let us consider the L1-distance estimation using the density ratio. The L1-distance estimator under the Contamination (11) is distributed around:
7. Robustness under Scale Models
In this section, we consider the estimation of density differences using the model with the scale parameter. For such a model, the pseudo-spherical score does not work as shown in Example 3. Furthermore, in the previous section, we presented the instability of the density-ratio estimation against the gross outliers. Hence, in this section, we focus on the density-difference estimation using the density-power score with the scale models.
7.1. Decomposition of Density-Difference Estimation Procedure
We show that the estimation procedure of the density difference using the density-power score is decomposed into two steps: estimation using the pseudo-spherical score with the non-scale model and estimation of the scale parameter. Note that the estimation in the first step has already been investigated in the last section.
Let us consider the statistical model satisfying the following assumption:
Assumption 2. Let diff be the model for the density difference. For all f ∈ diff and all c ∈ ℝ, cf ∈ diff holds.
The model satisfying the above assumption is referred to as the scale model. A typical example of the scale model is the linear model:
Suppose that the k-dimensional scale model, diff, is parametrized as:
For the pseudo-spherical score, the equality S(f, g) = S(f, cg) holds for c > 0. Thus, the scale parameter is not estimable. Let us study the statistical property of the estimator based on the density-power score with the scale models:
Theorem 3. Let us consider the density-difference estimation. Define and as the density-power score and the pseudo-spherical score with a positive odd number, α, and the base measure of these scores is given by the Lebesgue measure, respectively. Let f0 be a function and c̄gθ̄ ∈ diff be the optimal solution of the problem,
The empirical density-difference, f̃, is allowed as the function, f0, in the above theorem. The proof is found in Appendix B. The same theorem for the non-negative functions is shown in [25].
Theorem 3 indicates that the minimization of the density-power score on the scale model is decomposed into two stages. Suppose that the true density-difference is f0 = p – q = c*gθ̄* ∈ diff. At the first stage of the estimation, the minimization problem (15) is solved on the non-scale model diff,±c*. Then, at the second stage, the scale parameter is estimated. Though c* is unknown, the estimation procedure can be virtually interpreted as the two-stage procedure using the non-scale model, diff,±c*.
7.2. Statistical Properties of Density-Difference Estimation
Based on the two-stage procedure for the minimization of the density-power score, we investigate the statistical properties of the density-difference estimator.
As shown in Section 6.2, the estimator using the pseudo-spherical score over the non-scale model, diff,c*, has the redescending property. Hence, the extreme outliers have little impact on the estimation of the shape parameter, θ̄. Under the Contamination (11), let us define θ̄ε as the optimal solution of the problem,
The scale parameter is given as:
The above analysis shows that the extreme outliers with the small contamination ratio, ε, will make the intensity of the estimated density-difference smaller by the factor, 1 − ε, and the estimated density-difference is distributed around (1 − ε)(p – q). Hence, the contamination with extreme outliers has little impact on the shape parameter in the density-difference estimator, when the density-power score is used.
Let us consider the L1-distance estimation using the density-difference estimator. Suppose that the true density-difference, p – q, is estimated by the density-power score with the scale model. Then, the L1-distance estimator under the Contamination (11) is distributed around:
8. Numerical Experiments
We conducted numerical experiments to evaluate the statistical properties of L1-distance estimators. We used synthetic datasets. Let N(μ, σ2) be the one-dimensional normal distribution with mean μ and variance σ2. In the standard setup, let us assume that the samples are drawn from the normal distributions,
Below, we show the models and estimators used in the L1-distance estimation. The non-scale model for the density-difference is defined as:
In numerical experiments, the error of the L1-distance estimator, d̂1(p, q), was measured by the relative error, |1 −d̂1(p, q)/d1(p, q)|. The number of training samples varied from 1, 000 to 10, 000, and that of the outliers varied from zero (no outlier) to 100. The parameter, α, in the score function was set to α = 1 or 3 for the density-difference (DF)-based estimators and α = 0.1 for the density-ratio (DR)-based estimators. For the density-ratio estimation, the score with large α easily yields numerical errors, since the power of the exponential model tends to become extremely large. For each setup, the averaged relative error of each estimator was computed over 100 iterations.
The numerical results are depicted in Figure 1, and details are shown in Table 1. In the figure, estimators with extremely large relative errors are omitted. As shown in Table 1, the estimation accuracy of the DR-based estimator was severely degraded by the contaminated samples. On the other hand, DF-based estimators were robust against outliers. The DF-based estimator with the pseudo-spherical score is less accurate than that with the density-power score. In the statistical inference, there is the tradeoff between the efficiency and robustness. Though the pseudo-spherical score provides a redescending estimator, the efficiency of the estimator is not high in practice. In the estimation of the probability density, the pseudo-spherical score having the parameter, α, ranging from 0.1 to one provides a robust and efficient estimator, and the estimator with large α became inefficient [23]. This is because the estimator with large α tends to ignore most of the samples. In the density-difference estimation, the parameter, α, should be a positive odd number. Hence, in our setup, the estimator using the pseudo-spherical score became inefficient. In terms of the density-power score, the corresponding DF-based estimator has the bounded influence function. As a result, the estimator is efficient and rather robust against the outliers. Furthermore, we found that the bias correction by multiplying the constant factor, (1 − ∊)−1, improves the estimation accuracy.
When there is no outlier, the two-step approach using the separately estimated probability densities has larger relative errors than the DF-based estimators using the density-power score. For the contaminated samples, the two-step approach is superior to the other methods, especially when the sample size is less than 2,000. In this case, the separate density estimation with the density-power score efficiently reduces the influence of the outliers. For the larger sample size, however, the DF-based estimators using the density-power score are comparable with the two-step approach. When the rate of the outliers is moderate, the DF-based approach works well, even though the statistical model is based on the semiparametric modeling, which has less information than the parametric modeling used in the two-step approach.
9. Conclusions
In this paper, we first proposed to use the Bregman score to estimate density differences and density ratios, and then, we studied the robustness property of the L1-distance estimator. We showed that the pseudo-spherical score provides a redescending estimator of the density difference under non-scale models. However, the estimator based on the density-power score does not have the redescending property against extreme outliers. In the scale models, the pseudo-spherical score does not work, since the corresponding potential is not strictly convex on the function space. We proved that the density-power score provides a redescending estimator for the shape parameter in the scale models. Under extreme outliers, the shift in the L1-distance estimator using the scale model is calculated. The density-power score provides a redescending estimator for the shape parameter in the scale models. Moreover, we proved that the L1-distance estimator is not significantly affected by extreme outliers. In addition, we showed that prior knowledge on the contamination ratio, ε, can be used to correct the bias of the L1-distance estimator. In numerical experiments, the density-power score provides an efficient and robust estimator in comparison to the pseudo-spherical score. This is because the pseudo-spherical score with large α tends to ignore most of the samples and, thus, becomes inefficient. In a practical setup, the density-power score will provide a satisfactory result. Furthermore, we illustrated that the bias correction by using the prior knowledge on the contamination ratio improves L1-distance estimators using scale models.
Besides the Bregman scores, there are other useful classes of estimators, such as local scoring rules [12,18,30]. It is therefore an interesting direction to pursue the possibility of applying another class of scoring rules to the estimation of density differences and density ratios.
A. Proof of Theorem 1
For the density-difference f(x) = fθ* (x) = p(x) – q(x), we define fε(x) as the contaminated density-difference,
The computation of the above derivative for each score yields the results.
B. Proof of Theorem 3
Let us consider the minimization of subject to f ∈ diff. For cg ∈ diff, we have:
TK was partially supported by Japan Society for the Promotion of Science (JSPS) KAKENHI Grant number 24500340, and MS was partially supported by JSPS KAKENHI Grant number 25700022 and Asian Office of Aerospace Research and Development (AOARD).
Conflict of Interest
The authors declare no conflict of interest.
- Sugiyama, M.; Liu, S.; du Plessis, M.C.; Yamanaka, M.; Yamada, M.; Suzuki, T.; Kanamori, T. Direct divergence approximation between probability distributions and its applications in machine learning. J. Comput. Sci. Eng 2013, 7, 99–111. [Google Scholar]
- Shimodaira, H. Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Plan. Infer 2000, 90, 227–244. [Google Scholar]
- Sugiyama, M.; Kawanabe, M. Machine Learning in Non-Stationary Environments : Introduction to Covariate Shift Adaptation (Adaptive Computation and Machine Learning); MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Sugiyama, M.; Suzuki, T.; Kanamori, T. Density Ratio Estimation in Machine Learning; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
- Hido, S.; Tsuboi, Y.; Kashima, H.; Sugiyama, M.; Kanamori, T. Inlier-based Outlier Detection via Direct Density Ratio Estimation. Proceedings of IEEE International Conference on Data Mining (ICDM2008), Pisa, Italy, 15–19 December 2008.
- Kanamori, T.; Suzuki, T.; Sugiyama, M. f-Divergence estimation and two-sample homogeneity test under semiparametric density-ratio models. IEEE Trans. Inform. Theor 2012, 58, 708–720. [Google Scholar]
- Kanamori, T.; Hido, S.; Sugiyama, M. Efficient direct density ratio estimation for non-stationarity adaptation and outlier detection. In Advances in Neural Information Processing Systems 21; MIT Press: Cambridge, MA, USA; 2009. [Google Scholar]
- Nguyen, X.; Wainwright, M.J.; Jordan, M.I. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans. Inform. Theor 2010, 56, 5847–5861. [Google Scholar]
- Qin, J. Inferences for case-control and semiparametric two-sample density ratio models. Biometrika 1998, 85, 619–639. [Google Scholar]
- Sugiyama, M.; Kanamori, T.; Suzuki, T.; du Plessis, M.C.; Liu, S.; Takeuchi, I. Density-difference estimation. Neural. Comput 2013, 25, 2734–2775. [Google Scholar]
- Sugiyama, M.; Suzuki, T.; Nakajima, S.; Kashima, H.; von Bünau, P.; Kawanabe, M. Direct importance estimation for covariate shift adaptation. Ann. Inst. Stat. Math 2008, 60, 699–746. [Google Scholar]
- Gneiting, T.; Raftery, A.E. Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc 2007, 102, 359–378. [Google Scholar]
- Ali, S.M.; Silvey, S.D. A general class of coefficients of divergence of one distribution from Another. J. Roy. Stat. Soc. Series B 1966, 28, 131–142. [Google Scholar]
- Csiszár, I. Information-type measures of difference of probability distributions and indirect observation. Stud. Sci. Math. Hung 1967, 2, 229–318. [Google Scholar]
- Brier, G.W. Verification of forecasts expressed in terms of probability. Mon. Weather Rev 1950, 78, 1–3. [Google Scholar]
- Good, I.J. Comment on “Measuring Information Uncertainty” by R. J. Buehler. In Foundations of Statistical Inference; Godambe, V.P., Sprott, D.A., Eds.; Dove: Mineola, NY, USA, 1971; p. 337339. [Google Scholar]
- Murata, N.; Takenouchi, T.; Kanamori, T.; Eguchi, S. Information geometry of U-Boost and Bregman divergence. Neural Comput 2004, 16, 1437–1481. [Google Scholar]
- Parry, M.; Dawid, A.P.; Lauritzen, S. Proper local scoring rules. Ann. Stat 2012, 40, 561–592. [Google Scholar]
- Hendrickson, A.D.; Buehler, R.J. Proper scores for probability forecasters. Ann. Mathe. Stat 42.
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley-Interscience: Landon, UK, 2006. [Google Scholar]
- Basu, A.; Harris, I.R.; Hjort, N.L.; Jones, M.C. Robust and efficient estimation by minimising a density power divergence. Biometrika 1998, 85, 549–559. [Google Scholar]
- Basu, A.; Shioya, H.; Park, C. Monographs on statistics and applied probability. In Statistical Inference: The Minimum Distance Approach; Taylor & Francis: Landon, UK, 2010. [Google Scholar]
- Fujisawa, H.; Eguchi, S. Robust parameter estimation with a small bias against heavy contamination. J. Multivar. Anal 2008, 99, 2053–2081. [Google Scholar]
- Hampel, F.R.; Rousseeuw, P.J.; Ronchetti, E.M.; Stahel, W.A. Robust Statistics. The Approach Based on Influence Functions; John Wiley and Sons, Inc: Landon, UK, 1986. [Google Scholar]
- Eguchi, S.; Kato, S. Entropy and divergence associated with power function and the statistical application. Entropy 2010, 12, 262–274. [Google Scholar]
- Maronna, R.; Martin, R.; Yohai, V. Robust Statistics: Theory and Methods; Wiley: Landon, UK, 2006. [Google Scholar]
- Wu, Y.; Liu, Y. Robust truncated hinge loss support vector machines. J. Am. Stat. Assoc 2007, 102, 974–983. [Google Scholar]
- Xu, H.; Caramanis, C.; Mannor, S.; Yun, S. Risk Sensitive Robust Support Vector Machines. Proceedings of the 48th IEEE Conference on Decision Control, Shanghai, China, 15–18 December 2009; pp. 4655–4661.
- Xu, L.; Crammer, K.; Schuurmans, D. Robust Support Vector Machine Training via Convex Outlier Ablation; AAAI: Boston, MA, USA, 2006; pp. 536–542.
- Kanamori, T.; Fujisawa, H. Affine invariant divergences associated with composite scores and its applications. Bernoulli, 2014. submitted. [Google Scholar]
outliers: n′ = m′ = 0 (no outlier) | |||||
DF/DR estimator:model | α | n = m = 1, 000 | n = m = 2, 000 | n = m = 5, 000 | n = m = 10, 000 |
DF density-power:nonscale | 1 | 0.033 (0.028) | 0.026 (0.024) | 0.016 (0.014) | 0.013 (0.010) |
DF density-power:nonscale | 3 | 0.048 (0.032) | 0.035 (0.029) | 0.017 (0.014) | 0.016 (0.010) |
DF density-power:scale | 1 | 0.037 (0.030) | 0.027 (0.025) | 0.016 (0.013) | 0.013 (0.010) |
DF density-power:scale | 3 | 0.075 (0.069) | 0.053 (0.058) | 0.028 (0.027) | 0.019 (0.017) |
DF pseudo-sphere:nonscale | 1 | 0.610 (0.450) | 0.604 (0.396) | 0.451 (0.320) | 0.452 (0.294) |
DF pseudo-sphere:nonscale | 3 | 0.782 (0.532) | 0.739 (0.491) | 0.604 (0.440) | 0.500 (0.379) |
DR density-power:scale | 0.1 | 0.035 (0.026) | 0.025 (0.022) | 0.015 (0.013) | 0.013 (0.009) |
Separate:density-power | 1 | 0.047 (0.038) | 0.032 (0.024) | 0.022 (0.017) | 0.014 (0.010) |
outliers: n′ = m′ = 10, τ = 100 | |||||
DF/DR estimator:model | α | n = m = 1, 000 | n = m = 2, 000 | n = m = 5, 000 | n = m = 10, 000 |
DF density-power:nonscale | 1 | 0.033 (0.026) | 0.028 (0.022) | 0.017 (0.013) | 0.013 (0.010) |
DF density-power:nonscale | 3 | 0.042 (0.033) | 0.036 (0.029) | 0.021 (0.016) | 0.015 (0.012) |
DF density-power:scale | 1 | 0.040 (0.030) | 0.031 (0.025) | 0.019 (0.014) | 0.014 (0.011) |
DF density-power:scale (bias-correct) | 1 | 0.036 (0.030) | 0.030 (0.025) | 0.018 (0.014) | 0.014 (0.011) |
DF density-power:scale | 3 | 0.089 (0.077) | 0.052 (0.047) | 0.031 (0.024) | 0.019 (0.016) |
DF density-power:scale (bias-correct) | 3 | 0.083 (0.075) | 0.049 (0.046) | 0.030 (0.023) | 0.019 (0.016) |
DF pseudo-sphere:nonscale | 1 | 0.658 (0.474) | 0.632 (0.424) | 0.515 (0.370) | 0.417 (0.297) |
DF pseudo-sphere:nonscale | 3 | 0.969 (0.494) | 0.743 (0.487) | 0.677 (0.483) | 0.506 (0.421) |
DR density-power:scale | 0.1 | – | – | – | – |
Separate:density-power | 1 | 0.032 (0.023) | 0.026 (0.019) | 0.015 (0.011) | 0.011 (0.008) |
outliers: n′ = m′ = 100, τ = 100 | |||||
DF/DR estimator:model | α | n = m = 1, 000 | n = m = 2, 000 | n = m = 5, 000 | n = m = 10, 000 |
DF density-power:nonscale | 1 | 0.090 (0.042) | 0.047 (0.028) | 0.023 (0.014) | 0.013 (0.010) |
DF density-power:nonscale | 3 | 0.093 (0.053) | 0.049 (0.032) | 0.025 (0.020) | 0.015 (0.012) |
DF density-power:scale | 1 | 0.099 (0.043) | 0.053 (0.029) | 0.028 (0.017) | 0.016 (0.011) |
DF density-power:scale (bias-correct) | 1 | 0.040 (0.031) | 0.028 (0.022) | 0.017 (0.013) | 0.011 (0.009) |
DF density-power:scale | 3 | 0.144 (0.100) | 0.083 (0.047) | 0.041 (0.030) | 0.025 (0.016) |
DF density-power:scale (bias-correct) | 3 | 0.076 (0.094) | 0.046 (0.041) | 0.031 (0.025) | 0.018 (0.014) |
DF pseudo-sphere:nonscale | 1 | 0.557 (0.461) | 0.511 (0.399) | 0.501 (0.372) | 0.465 (0.305) |
DF pseudo-sphere:nonscale | 3 | 0.807 (0.507) | 0.739 (0.508) | 0.581 (0.458) | 0.534 (0.396) |
DR density-power:scale | 0.1 | – | – | – | – |
Separate:density-power | 1 | 0.052 (0.036) | 0.036 (0.031) | 0.024 (0.017) | 0.014 (0.009) |
© 2014 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (
Share and Cite
Kanamori, T.; Sugiyama, M. Statistical Analysis of Distance Estimators with Density Differences and Density Ratios. Entropy 2014, 16, 921-942.
Kanamori T, Sugiyama M. Statistical Analysis of Distance Estimators with Density Differences and Density Ratios. Entropy. 2014; 16(2):921-942.
Chicago/Turabian StyleKanamori, Takafumi, and Masashi Sugiyama. 2014. "Statistical Analysis of Distance Estimators with Density Differences and Density Ratios" Entropy 16, no. 2: 921-942.
APA StyleKanamori, T., & Sugiyama, M. (2014). Statistical Analysis of Distance Estimators with Density Differences and Density Ratios. Entropy, 16(2), 921-942.