biom.table.Table.subsample¶
- Table.subsample(n, axis='sample', by_id=False, with_replacement=False)¶
Randomly subsample without replacement.
Parameters: n : int
Number of items to subsample from counts.
axis : {‘sample’, ‘observation’}, optional
The axis to sample over
by_id : boolean, optional
If False, the subsampling is based on the counts contained in the matrix (e.g., rarefaction). If True, the subsampling is based on the IDs (e.g., fetch a random subset of samples). Default is False.
with_replacement : boolean, optional
If False (default), subsample without replacement. If True, resample with replacement via the multinomial distribution. Should not be True if by_id is True.
Returns: biom.Table
A subsampled version of self
Raises: ValueError
- If n is less than zero.
- If by_id and with_replacement are both True.
Notes
Subsampling is performed without replacement. If n is greater than the sum of a given vector, that vector is omitted from the result.
Adapted from skbio.math.subsample, see biom-format/licenses for more information about scikit-bio.
This code assumes absolute abundance if by_id is False.
Examples
>>> import numpy as np >>> from biom.table import Table >>> table = Table(np.array([[0, 2, 3], [1, 0, 2]]), ['O1', 'O2'], ... ['S1', 'S2', 'S3'])
Subsample 1 item over the sample axis by value (e.g., rarefaction):
>>> print(table.subsample(1).sum(axis='sample')) [ 1. 1. 1.]
Subsample 2 items over the sample axis, note that ‘S1’ is filtered out:
>>> ss = table.subsample(2) >>> print(ss.sum(axis='sample')) [ 2. 2.] >>> print(ss.ids()) ['S2' 'S3']
Subsample by IDs over the sample axis. For this example, we’re going to randomly select 2 samples and do this 100 times, and then print out the set of IDs observed.
>>> ids = set([tuple(table.subsample(2, by_id=True).ids()) ... for i in range(100)]) >>> print(sorted(ids)) [('S1', 'S2'), ('S1', 'S3'), ('S2', 'S3')]