P. Sabeti, S. Schaffner, B. Fry, J. Lohmueller, P. Varilly et al., Positive Natural Selection in the Human Lineage, Science, vol.312, issue.5780, pp.1614-1620, 2006.
DOI : 10.1126/science.1124309

M. Wang, X. Huang, R. Li, H. Xu, L. Jin et al., Detecting Recent Positive Selection with High Accuracy and Reliability by Conditional Coalescent Tree, Molecular Biology and Evolution, vol.31, issue.11, pp.31-3068, 2014.
DOI : 10.1093/molbev/msu244

A. Srivastava and M. Sahami, Text Mining: Classification, Clustering, and Applications, 2009.
DOI : 10.1201/9781420059458

T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: data mining, inference and prediction, 2009.

C. Aggarwal and C. Zhai, A Survey of Text Clustering Algorithms, Mining Text Data, pp.77-128, 2012.
DOI : 10.1007/978-1-4614-3223-4_4

T. Landauer, P. Foltz, and D. Laham, An introduction to latent semantic analysis, Discourse processes, pp.25-259, 1998.

D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, NIPS, pp.556-562, 2001.

D. M. Witten and R. Tibshirani, A Framework for Feature Selection in Clustering, Journal of the American Statistical Association, vol.105, issue.490, pp.713-726, 2010.
DOI : 10.1198/jasa.2010.tm09415

A. Singhal, Modern Information Retrieval: A Brief Overview, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol.24, pp.35-42, 2001.

J. Gower and G. J. Ross, Minimum Spanning Trees and Single Linkage Cluster Analysis, Applied Statistics, vol.18, issue.1, pp.54-64, 1969.
DOI : 10.2307/2346439

A. Lequarré, L. Andersson, C. André, M. Fredholm, C. Hitte et al., LUPA: A European initiative taking advantage of the canine genome architecture for unravelling complex disorders in both human and dogs, The Veterinary Journal, vol.189, issue.2, pp.155-159, 2011.
DOI : 10.1016/j.tvjl.2011.06.013

T. Calinski and L. C. Corsten, Clustering Means in ANOVA by Simultaneous Testing, Biometrics, vol.41, issue.1, pp.39-48, 1985.
DOI : 10.2307/2530641

S. C. Goslee and D. L. Urban, The ecodist package for dissimilarity-based analysis of ecological data, Journal of Statistical Software, vol.22, pp.1-19, 2007.

F. Wild, lsa: Latent Semantic Analysis, 2014.

R. Gaujoux and C. Seoighe, A flexible R package for nonnegative matrix factorization, BMC Bioinformatics, vol.11, issue.1, 2010.
DOI : 10.1186/1471-2105-11-367

A. Vaysse, A. Ratnakumar, T. Derrien, E. Axelsson, G. R. Pielberg et al., Identification of Genomic Regions Associated with Phenotypic Variation between Dog Breeds using Selection Mapping, PLoS Genetics, vol.33, issue.10, p.1002316, 2011.
DOI : 10.1371/journal.pgen.1002316.s019

URL : https://hal.archives-ouvertes.fr/inserm-00638834

P. Scheet and M. Stephens, A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic Phase, The American Journal of Human Genetics, vol.78, issue.4, pp.629-644, 2006.
DOI : 10.1086/502802

J. M. Akey, A. L. Ruhe, D. T. Akey, A. K. Wong, C. F. Connelly et al., Tracking footprints of artificial selection in the dog genome, Proceedings of the National Academy of Sciences, vol.107, issue.3, pp.1160-1165, 2010.
DOI : 10.1073/pnas.0909918107

R. K. Bustamante, E. A. Wayne, and . Ostrander, Coat variation in the domestic dog is governed by variants in three genes, Science, vol.326, pp.150-153, 2009.
URL : https://hal.archives-ouvertes.fr/inserm-00412221

J. W. Kijas, J. A. Lenstra, B. Hayes, S. Boitard, L. R. Neto et al., Genome-Wide Analysis of the World's Sheep Breeds Reveals High Levels of Historic Mixture and Strong Recent Selection, PLoS Biology, vol.19, issue.2, p.1001258, 2012.
DOI : 10.1371/journal.pbio.1001258.s017