API - class/function docs¶
itermae
is intended to be used as a command line utility, but here’s
list of the internal functions and classes to orient y’all for debugging (and
contributing to development?).
Essentially, the bin/itermae
launcher CLI script reads arguments in,
creates a Configuration
class, tells it to configure with certain arguments,
then tells it to start reading with method .reader()
.
Internally, it creates a SeqHolder
object for each input read,
which handles all the sequence intermediates and outputing.
There are a few other little utility functions/modules.
The below is automatically generated from the function-level docstrings:
- class itermae.Configuration¶
This class is for configuring itermae, from YAML or CLI arguments. No arguments for initializing, it will set default values. Then you use the configuration methods.
- check_reserved_name(name, reserved_names=['dummyspacer', 'input', 'id', 'description'])¶
This checks if the name is one of a reserved list, and raises error if so. These names are reserved for these reasons:
dummyspacer is so you can pop an X into your sequence as a separator
delimiter for later processing - input is the input group, the original one - id is the input ID, here just as id so it`s easy to find - description is for mapping over the FASTQ description
- Parameters
name (str) – name of group
- Raises
ValueError – raised if you’re using one of the reserved names…
- close_fhs()¶
This is for cleaning up, and tries to close file handles at input_seqs, ouput_fh, failed_fh, report_fh.
- config_from_args(args_copy)¶
Make configuration object from arguments provided. Should be the same as the config_from_yaml output, if supplied the same.
- Parameters
args_copy (argparse object, I think) – pass in the argparse args object after collecting the startup command line arguments
- Raises
ValueError – I failed to build the regular expression for a match
ValueError – The output IDs, seqs, descriptions, and filters are of unequal sizes, make them equal or only define one of each
ValueError – Either the supplied filter, id, seq, or description expression for a match group does not look like a python expression
- config_from_file(file_path)¶
Tries to parse a configuration YAML file to update this configuration object. Pass in the file path as an argument. Recommend you run this config first, then config_from_args, as done in bin/itermae.
- Parameters
file_path (str) – file path to configure from, expecting it to point to an appropriately formatted YAML file
- Raises
ValueError – Failure to parse the supplied YAML
KeyError – You need to define a group called pattern: inside each of the list inside of matches:
ValueError – Error in yaml config, you`ve repeated a group marking character to match in multiple places
ValueError – Error in yaml config, the pattern and marking you`ve defined are of different lengths
ValueError – Error in yaml config
KeyError – Marked roup in marking: field does not have corresponding entry in marked_groups:.
ValueError – Either the supplied filter, id, seq, or description expression for a match group does not look like a python expression
- get_input_seqs()¶
This calls open_input_fh() to set the input_fh attribute, then calls open_appropriate_input_format to use this and the input_format attribute to save an iterator of SeqRecords into input_seqs.
Note this is inconsistent with design of the output, will pick one or the other … later.
- open_appropriate_input_format()¶
Uses input_format and input_fh to set iterators of SeqRecords from the appropriate inputs, in input_seqs. Tries to handle all formats known, but will try with SeqIO in case there’s one I didn’t think about.
- open_input_fh()¶
Opens file-handle based on the configuration. Requires input to be set.
- Raises
ValueError – Can’t handle gzipped inputs on STDIN.
- open_output_fh(file_string)¶
Opens output file handle, which can then be written to later with a format specification.
Note this is inconsistent with design of the input, will pick one or the other … later.
- Parameters
file_string (str) – file to wrote to, or STDOUT or STDERR
- Returns
file string for appending output
- Return type
file handle returned by open()
- reader()¶
This reads inputs, calls the chop method on each one, and sorts it off to outputs. So this is called by the main function, and is mostly about handling the I/O and handing it to the chop function. Thus, this depends on the Configuration class being properly configured with all the appropriate values.
- class itermae.SeqHolder(input_record, configuration)¶
- This is the main holder of sequences, and has methods for doing matching,
building contexts, filtering, etcetra. Basically there is one of these initialized per input, then each operation is done with this object, then it generates the appropriate outputs and chop actually writes them. Used in chop.
The .seqs attribute holds the sequences accessed by the matching, initialized with the input_record SeqRecord and a dummyspacer for output formatting with a separator.
- param input_record
an input SeqRecord object
- type input_record
Bio.SeqRecord.SeqRecord
- param configuration
the whole program’s Configuration object, with appropriate file-handles opened up and defaults set
- type configuration
itermae.Configuration
# :raises [ErrorType]: [ErrorDescription] # :return: [ReturnDescription] # :rtype: [ReturnType]
- apply_operation(match_id, input_group, regex)¶
This applies the given match to the SeqHolder object, and saves how it did internally.
- Parameters
match_id (str) – what name should we call this match? This is useful for debugging reports and filtering only.
input_group (str) – which input group to use, by name of the group
regex (regex compiled regular expression object) – the regular expression to apply, complete with named groups to save for subsequent match operations
- Returns
self, this is just done so it can exit early if no valid input
- Return type
- build_context()¶
This unpacks group match stats/scores into an environment that the filter can then use to … well … filter.
- build_output(output_dict)¶
Builds the output from the SeqHolder object according to the outputs in output_dict.
- Parameters
output_dict (dict) – a dictionary of outputs to form, as generated from the configuration initialization
- Returns
the successfully built SeqRecord, or None if it fails
- Return type
Bio.SeqRecord.SeqRecord or None
- chop()¶
This executes the intended purpose of the SeqRecord object, and is called once. It uses the configured object to apply each match operation as best it can with the sequences it is given or can generate, then writes the outputs in the specified formats to specified places as configured.
- evaluate_filter_of_output(output_dict)¶
This tests a user-defined filter on the ‘seq_holder’ object. This has already been compile’d, and here we just attempt to evaluate these to True, where True is passing the filter. Exceptions are blocked by using try/except so that it can fail on a single match and move onto the next match/read.
- Parameters
output_dict (dict) – a dictionary of outputs to form, as generated from the configuration initialization
- Returns
True if the filter passed and the output should be generated
- Return type
bool
- format_report(label, output_seq)¶
Formats a standard report line for the debug reporting function.
- Parameters
label (Bio.SeqRecord.SeqRecord or None) – what type of report line this is, so a string describing how it went - passed? Failed?
label – the attempt at generating an output SeqRecord, so either one that was formed or None
- Returns
the string for the report
- Return type
str