Data from: A cost-effective blood DNA methylation-based age estimation method in domestic cats, Tsushima leopard cats (Prionailurus bengalensis euptilurus), and Panthera species, using targeted bisulfite sequencing and machine learning models
-
- Qi, Huiyuan
- 作成者
-
- Lim, Qi Luan
- 作成者
メタデータ
- 公開日
- 2024-01-02
- DOI
-
- 10.5061/dryad.3r2280gn4
- 公開者
- Dryad
- データ作成者 (e-Rad)
-
- Qi, Huiyuan
- Lim, Qi Luan
- Kinoshita, Kodzue
- Nakajima, Nobuyoshi
- Inoue-Murayama, Miho
説明
# Datasets --- ### Appendix S1–S4 The files included the methylation data, sample information, and predicted age of each target species/species group. The data in the files are used to build age estimation models. 'domestic cat' in the filename means the file is for the domestic cat; 'leopard cat' means for the Tsushima leopard cat; 'panthera' means for the Panthera species (i.e., jaguar, leopard, lion, snow leopard, and tiger), and 'all' means for all the samples from all species. ### Appendix S5 The file contains the CpG selection results for the best age estimation model of each species/species group, the frequency of being selected in elastic net feature selection of each CpG site, correlation coefficients between the methylation rate and chronological age of each CpG site, and NCBI sequence ID with position. ### CpG No renamed fulllist\_all felidae.csv The file showed the list of CpGs, which were at least contained in one species. ### M%+sampleinfo\*.csv These files are the version of Appendix S1–S4 before adding the predicted age. ### indextable\_skf\_cor\*.csv Raw results of feature selection (correlation-based). ### indextable\_skf\_loio\_ela\*.csv Raw results of feature selection (elastic net-based, leave-one-individual-out cross-validation). ### indextable\_skf\_loso(\_raw)\_ela\*.csv Raw results of feature selection (elastic net-based, leave-one-species-out cross-validation). *P.S. Appendix S1-S5 are referred to in our paper. Other files were only used in the analysis.* # Description of the data sets and file structures ### Appendix S1–S4, M%+sampleinfo\*.csv * amp3_,amp4_, amp8_, amp9_, and bs38\_ in the head are the names of CpG sites. Columns with the heads showed the results of methylation rates. The proximal genes and positions in genomes could be referred to in Appendix S5 and CpG No renamed fulllist_all felidae.csv. * Health_condition_ed: health condition at the time of sampling (good, diseased). * Health_condition (Appendix S2–S4, species other than domestic cats): raw health condition data * Health condition information in Appendix S1 (domestic cats): * Health_condition_Healthy (column K): healthy sample Health_condition_CKD (column L): sample with chronic kidney disease Health_condition_Diabetes (column M): sample with diabetes Health_condition_Cancer (column N): sample with cancer Health_condition_DigestiveDisease (column O): sample with digestive diseases Health_condition_Others (column P): sample with other diseases * Fold: data was split into five folds (0–4) with similar age and species distribution using stratified k-fold. * Age_class: age class of each sample. * Predictedage_*: age predicted through the methods below. | Feature selection methods | Regression methods | Column name (after 'Predictedage\_') | | --------------------------- | ------------------------ | ------------------------------------ | | ---------elastic net------- | -------only once-------- | ela | | elastic net | elastic net | ela\_ela | | elastic net | SVMr | ela\_svmr | | cor ≥ 0.5 | elastic net | cor0\_5\_ela | | cor ≥ 0.7 | elastic net | cor0\_7\_ela | | cor ≥ 0.5 | SVMr | cor0\_5\_svmr | | cor ≥ 0.7 | SVMr | cos0\_7\_svmr | * For Appendix S2 and M%+sampleinfo_leopardcat_paper_final_fold+ageclass.csv * 'Age_stage_at_time_of_protection' shows the age stages estimated when the individuals were protected from morphological methods. * 'Death_date' shows the death date. No data here means the individuals are still alive in 2023. This data was not used in the analysis. * Empty cells mean no data. Captive-born individuals had no data in 'Age_stage_at_time_of_protection'. Wild-born individuals had no data in 'Age', 'Health_condition_ed','Fold', 'Age_class', which were only available for captive-born individuals with age known. The predicted epigenetic age was only calculated using the best model and summarized in 'Predictedage_ela_svmr'. * For Appendix S3 and M%+sampleinfo_panthera_paper_final_fold+ageclass.csv, Appendix S4 and M%+sampleinfo_all_paper_final_fold+relative_ageclass.csv * 'Predictedage_*_loso(_raw)' is age predicted under the model evaluation of leave-one-species-out-cross-validation. * For Appendix S4 * 'Predictedage_* ' is the predicted relative age of each sample. 'Predictedage_*_chronoloical age' is the predicted chronological age under the best models. * Empty cells mean no data. The summarizing standard for domestic cats and other species was different. Therefore, empty cells are in health condition-related columns. ### Appendix S5, CpG No renamed fulllist\_all felidae.csv * Columns E to M showed whether the CpG sites existed in each species group. 0 means the CpG does not exist in the species; 1 means the CpG exists in the species. Panthera_spp. (column L) included species in column G–K (i.e. jaguar, leopard, lion, snow leopard, and tiger). All_spp. (column M) included all species. ### Appendix S5 * Green, yellow, orange, ...
Knowledge of individual age can help both in-situ and ex-situ conservation programs to design more efficient and suitable management plans for targeted wildlife species. DNA methylation is one of the epigenetic aging markers that has emerged as a promising tool that can estimate age with high accuracy using only a tiny amount of biological material, which can be collected in a minimally invasive way. Here, we sequenced five targeted genetic regions and used 8–23 selected CpG sites to build age estimation models with machine learning methods with about only $3–7 per sample, using blood samples of seven Felidae species—ranging from small to big, and domestic to endangered species: domestic cats (Felis catus, 139 samples), Tsushima leopard cats (Prionailurus bengalensis euptilurus, 84 samples), and five Panthera species (96 samples). The models built achieved satisfactory accuracy—the mean absolute error of the best models was 1.966, 1.348, and 1.552 years in domestic cats, Tsushima leopard cats, and Panthera spp., respectively. Our models in domestic cats and Tsushima leopard cats were applicable to individuals regardless of health conditions, indicating the high applicability of our models to samples collected from diverse situations, e.g., rescued individuals in the context of conservation. We also showed the possibility of developing universal age estimation models for the five Panthera spp. using two of the five genetic regions, suggesting an even lower cost to use our models for future applications.