, 2009 [34]. 3The number of reads PF-573228 per dataset after removal of sequences that could be from the same source as those in the contamination control dataset. 4OTUs: Operational Taxonomic Units at 3% or 6% nucleotide difference. 5Number of phyla
and genera are based on taxonomic classification by MEGAN V3.4 [36, 37], with the total number of phyla and genera detected in parenthesis. 6Chao1 is an estimator of the minimum MK-0457 solubility dmso richness and is based on the number of rare OTUs (singletons and doublets) within a sample. 7The Shannon index combines estimates of richness (total number of OTUs) and evenness (relative abundance). 8The Shannon index after normalization of the number of sequences (as described in Methods). The 454 pyrosequencing method has a characteristic error rate in the form of insertion/deletion errors at homopolymer runs. To correct for this phenomenon, the raw reads were processed with PyroNoise [34] with a minimum length cutoff of 218 and 235 nt for the V1V2 and V6 regions, respectively. The PyroNoise program clusters all reads ABT-263 manufacturer whose flowgrams indicate that they could stem from the same sequence, while also considering read abundance. After denoising, one sequence per cluster
together with the number of reads mapping to that cluster is reported. Next, the sequences (at this stage one sequence per denoised cluster) that did not have
an exact match to the primer were removed, and the forward primer sequence itself was also trimmed. Finally, the urine sample sequence sets were stripped for sequences that could be from the same source as those in the contamination control dataset. This was done by using Quisqualic acid the program ESPRIT http://www.biotech.ufl.edu/people/sun/esprit.html[35] to do a complete linkage clustering at 1% genetic difference of each sample together with its respective control. Before clustering, the control sequences were weighed so that there were the same number of reads stemming from both the sample and the control going into the process. Within each cluster the frequency of sample vs control sequence was calculated, and any sample sequences found in clusters where 50% or more of the sequences belonged to the control were removed. For taxonomic grouping we used MEGAN V3.4 http://www-ab.informatik.uni-tuebingen.de/software/megan/welcome.html[36, 37], which uses blast hits to place reads onto a taxonomy by assigning each read to a taxonomic group at a level in the NCBI taxonomy. The sequence reads (one read per denoised cluster from the pyronoise step) that passed the filtering steps were compared to a curated version of the SSUrdp database [38] using blastn with parameters set to a maximum expectation value (E) of 10-5. The 25 best hits were kept.