Note on zero and missing values in compositional data

  • Arai Hiroyoshi
    Department of Earth Sciences, Faculty of Education and Integrated Arts and Sciences, Waseda University
  • Ohta Tohru
    Department of Earth Sciences, Faculty of Education and Integrated Arts and Sciences, Waseda University

Bibliographic Information

Other Title
  • 組成データ解析における0値および欠損値の扱いについて
  • ソセイ データ カイセキ ニ オケル 0チ オヨビ ケッソンチ ノ アツカイ ニ ツイテ

Search this article

Abstract

In the field of geology, compositional data, such as petrochemical compositions, faunal compositions and modal compositions of sandstones, are common. This type of data contains an awkward mathematical problem known as constant-sum constraint. To resolve this problem, logratio and simplicial analyses have been developed in the last two decades. However, zero and missing values are common in practical compositional data, which are troublesome for logratio or simplicial analysis because neither logarithm nor geometric mean can take zeros. In this context, many authors have suggested nonparametric replacement methods of zero and missing values to overcome this problem. We review these nonparametric methods, additive replacement and multiplicative replacement, with their merits and limitations, after showing types and nature of zeros: rounded zeros stemmed from a detection limit of apparatus and essential (or true) zeros designating nothing. Zero replacement, however, may create outliers of data vectors and would lead us to erroneous conclusions. For this reason, we also review how to assess the outlier: by atypicality indices of data vectors and by confidence regions of a population. To disseminate statistically rigorous replacement and outlier detection, computer programs for open source statistical environment `R', which replace zeros in a given data set and calculate atypicality indices, were developed.<br>

Journal

Citations (3)*help

See more

References(72)*help

See more

Details 詳細情報について

Report a problem

Back to top