E-statistics (energy statistics)

Research and software related to E-statistics

E-statistics (energy statistics) refers to a class of tests and statistics based on Euclidean distances. Applications include testing multivariate normality, multivariate distance components and k-sample test for equal distributions, hierarchical clustering by e-distances, multivariate independence tests, distance correlation, goodness-of-fit tests.

Gabor J. Szekely, National Science Foundation
Maria L. Rizzo, Bowling Green State University, email: email

R software: Energy statistics are implemented in the contributed package energy for R.

References

  1. G. J. Szekely and M. L. Rizzo (2013). Energy statistics: statistics based on distances. Journal of Statistical Planning and Inference Volume 143, Issue 8, August 2013, pp. 1249-1272. DOI
  2. G. J. Szekely and M. L. Rizzo (2013). The distance correlation t-test of independence in high dimension. Journal of Multivariate Analysis, Volume 117, pp. 193-213. DOI
  3. G. J. Szekely and M. L. Rizzo (2012). On the uniqueness of distance covariance. Statistics & Probability Letters, Volume 82, Issue 12, 2278-2282. DOI
  4. Maria L. Rizzo and Gabor J. Szekely (2010). DISCO Analysis: A Nonparametric Extension of Analysis of Variance, Annals of Applied Statistics Vol. 4, No. 2, 1034-1055. Reprint DOI
  5. Gabor J. Szekely and Maria L. Rizzo (2009). Brownian Distance Covariance,
    Annals of Applied Statistics, Vol. 3, No. 4, 1236-1265.    Reprint    doi:10.1214/09-AOAS312
  6. Gabor J. Szekely and Maria L. Rizzo (2009). Rejoinder: Brownian Distance. Covariance, Annals of Applied Statistics, Vol. 3, No. 4, 1303-1308.    Reprint    doi:10.1214/09-AOAS312REJ
  7. Maria. L. Rizzo (2009). New Goodness-of-Fit Tests for Pareto Distributions, ASTIN Bulletin: Journal of the International Association of Actuaries, 39/2, 691-715. PDF
  8. G. J. Szekely, M. L. Rizzo, and N. K. Bakirov (2007). Measuring and Testing Independence by Correlation of Distances, Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794. http://dx.doi.org/10.1214/009053607000000505.    Reprint
  9. Bakirov, N. K., Rizzo, M. L., and Szekely, G. J. (2006). A Multivariate Nonparametric Test of Independence, Journal of Multivariate Analysis Volume 97, Issue 8 , September 2006, Pages 1742-1756 http://dx.doi.org/10.1016/j.jmva.2005.10.005.
  10. Szekely, G. J. and Rizzo, M. L. (2005) Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method,
    Journal of Classification, 22(2) 151-183. http://dx.doi.org/10.1007/s00357-005-0012-9.
  11. Szekely, G. J. and Rizzo, M. L. (2005) A New Test for Multivariate Normality,
    Journal of Multivariate Analysis, 93/1, 58-80. http://dx.doi.org/10.1016/j.jmva.2003.12.002. Reprint
  12. Szekely, G. J. and Rizzo, M. L. (2004b) Mean Distance Test of Poisson Distribution,
    Statistics and Probability Letters, 67/3, 241-247 http://dx.doi.org/10.1016/j.spl.2004.01.005.
  13. Rizzo, M. L. (2003) Hierarchical Clustering Based on a Generalized Measure of Homogeneity,
    2003 Proceedings of the Joint Statistical Meetings, American Statistical Association, Section for Physical and Engineering Sciences [CD-ROM], Alexandria, VA: American Statistical Association.
  14. Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, Nov. (5).    Reprint
  15. M. L. Rizzo (2005) Minimum Energy Clustering Proceedings of Interface/Classification Society of North America, Joint Annual Meeting, 2005.
  16. Rizzo, M. L. (2002a). A Test of Homogeneity for Two Multivariate Populations,
    2002 Proceedings of the American Statistical Association, Physical and Engineering Sciences Section [CD-ROM], Alexandria, VA: American Statistical Association.
  17. Rizzo, M. L. (2002b). A New Rotation Invariant Goodness-of-Fit Test, Ph.D. dissertation, Bowling Green State University.    Abstract
  18. Szekely, G. J. (2000) E-statistics: Energy of Statistical Samples, Bowling Green State University, Department of Mathematics and Statistics Technical Report No. 03-05.
  19. Szekely, G. J. (1989) Potential and Kinetic Energy in Statistics, Lecture Notes, Budapest Institute of Technology (Technical University).

Software for R: energy


R is a free software environment for statistical computing and graphics, available at the Comprehensive R Archive Network (CRAN)..
This software is distributed under GNU General Public License Version 2, or later. See COPYING for the license.

Questions or comments on software: Maria Rizzo, email address above


[go to References]

Current version energy_1.6.0 released 2013-05-12.

Summary of recent changes in energy package

NEWS

  • distance correlation t-test for high dimension implemented (introduced in SR 2013, JMVA)
  • In eqdist.e and eqdist.etest, method="disco" was replaced by two options: "discoB" (between sample components) and "discoF" (disco F ratio).
  • In distance components: Added disco.between and internal functions that compute the disco between-sample component and corresponding test.
  • (DIStance COmponents) function and test added in energy (version 1.2-0 27-Sept-2010)
    disco provides a nonparametric approach to analysis of structured data, using distance components rather than variance components. The statistic is related to, but not equivalent to, the ksample statistic. A disco method has been added to the eqdist.etest function and the corresponding eqdist.e statistic.
  • distance correlation and distance covariance:
    The dcov package is now merged into energy version 1.1-0 package, available on CRAN 07-Apr-2008.

MATLAB: Some functions in energy have been translated to Matlab.

<-back to home