How to

The MutationInfo package contains one class: MutationInfo which offers a single method: get_info.

The MutationInfo class

class MutationInfo.MutationInfo(local_directory=None, email=None, genome='hg19', dbsnp_version='snp146', **kwargs)

The MutationInfo class handles all necessary connections to various sources in order to assess the chromosomal position of a variant. The first time that this class is instantiated it downloads the reference genome in fasta format and splits it per chromosome. This might take approximately 13GB of disc space.

MutationInfo offers a single method for accessing the complete functionality of the module: get_info().

This class does not have any required arguments for initialization. Nevertheless the following optional arguments are supported:

Parameters:
  • local_directory – The local directory where the fasta files will be stored. By default MutationInfo uses the appdirs module in order to create a platform specifc local directory. This directory is also used as a cache. Whenever there is a succesful attempt to access an external service, the acquired object is saved to local_directory for future reference.
  • email – An email is required to connect with Entrez through biopython (see also this: http://biopython.org/DIST/docs/api/Bio.Entrez-module.html). If not set, MutationInfo looks for an email entry in the file stored in <local_directory>/properties.json. If this file does not exist (for example when the class is instantiated for the first time), then it requests one email from the user and stores it in the properties.json file.
  • genome

    The version of the preferred human genome assembly that will be used for reporting chromosomal positions. Accepted values should have the hgXX format. Default value is hg19.

    Warning

    MutationInfo does not guarantee that the returned position is aligned according to the genome parameter since certain tools work only with specific genome assemblies. For this reason always check the genome key of the returned item after calling the get_info() method.

  • ucsc_genome – Set the version of human genome assembly explicitly for the CruzDB tool (UCSC). Default: Same as the genome parameter.
  • dbsnp_version – The version of dbsnp for rs variants. Default value is snp146.

The get_info method

MutationInfo.get_info(variant, empty_current_fatal_error=True, **kwargs)

Gets the chromosome, position, reference and alternative of a dbsnp or HGVS variant. If the method parameter is not specified, by default it will go through the following pipeline:

http://i.imgur.com/BAak2rE.png
Parameters:variant – A variant (in str or unicode) or list of variants. Both rs (i.e. rs56404215) or HGVS (i.e. NM_006446.4:c.1198T>G) are accepted.

Optional arguments:

Parameters:method – Instead of the default pipeline, use a specific tool. Accepted values are:
  • UCSC : Use CruzDB (only for dbsnp variants)
  • VEP : Use Variant Effect Predictor (only for dbsnp variants)
  • MYVARIANTINFO : Use MyVariant.info (only for dbsnp variants)
  • BIOCOMMONS : Use Biocommons HGVS (only for HGVS variants)
  • COUNSYL : Use Counsyl HGVS (only for HGVS variants)
  • MUTALYZER : Use Mutalyzer (only for HGVS variants)
  • BLAT : Perform a BLAT search (only for HGVS variants)
  • LOVD Search LOVD database (only for HGVS variants)
  • VARIATION_REPORTER Search Variation Reported
  • TRANSVAR Search Transvar (Experimental, requires installation of TRANSVAR CLI)
Returns:If the pipeline or the selected method fails then the return value is None. Otherwise it returns a dictionary with the following keys:
  • chrom : The chromosome where this variant is located. The type of this value is str in order to have a universal type for all possible chromosome values (including X and Y).
  • offset : The nucleotide position of the variant.
  • ref : The reference sequence of the variant. In case of insertions this value is an empty string.
  • alt : The alternative sequence of the variant. In case of deletions this value is an empty string.
  • genome : The version of the human genome assembly for this position.
  • source : The name of the tool that was used to locate the position.
  • notes : Possible warnings, errors and notes that the tools generated during the conversion.

An example of output is the following:

Example:
>>> from MutationInfo import MutationInfo
>>> mi = MutationInfo()
>>> info = mi.get_info('NM_000367.2:c.-178C>T')
>>> print info
{'chrom': '6', 'notes': '', 'source': 'counsyl_hgvs_to_vcf', 'genome': 'hg19', 'offset': 18155397, 'alt': 'A', 'ref': 'G'}