`CodonU.analyzer.internal_comp`

Module Contents

Functions

`is_not_bad_seq`(→ bool)	Checks if the sequence is bad i.e. length of the sequence is not divisible by 3
`not_contains_amb_letter`(→ bool)	Checks if provided sequence contains ambiguous DNA letters
`g3`(→ float)	Calculates percentage of G content for third position
`a3`(→ float)	Calculates percentage of A content for third position
`gc_123`(→ tuple[float, float \| int, float \| int, ...)	Calculate G+C content: total, for first, second and third positions
`at_123`(→ tuple[float, float \| int, float \| int, ...)	Calculate G+C content: total, for first, second and third positions
`custom_codon_table`(→ None)	Registers a new Codon Table as provided by the user.
`filter_reference`(→ list[Bio.SeqRecord.SeqRecord])	Filters the list of reference based on given threshold of length
`reverse_table`(→ dict[str, list[str]])	Creates the protein, codon dictionary where protein is key
`syn_codons`(→ dict[str, list[str]])	Creates the codon, synonymous codon family dictionary where codon is the key
`sf_vals`(→ dict[int, list[str]])	Creates the sf value and protein dictionary where sf value is key
`rscu`(→ dict[str, float])	Calculates relative synonymous codon usage (RSCU) value for a given nucleotide sequence according to Sharp and Li (1987)
`weights_for_cai`(→ dict[str, float])	Calculates relative adaptiveness/weight value for a given nucleotide sequence according to Sharp and Li (1987)
`cai`(→ float)	Calculates Codon Adaptive Index (CAI) value for a given nucleotide sequence according to Sharp and Li (1987)
`cbi`(→ tuple[float, str])	Calculates codon bias index (CBI) for a given protein seq based on Bennetzen and Hall (1982)
`enc`(→ float)	Calculates Effective number of codons (Enc) based on Wright (1989) and Fuglsang (2004)
`gravy`(→ float)	Computes the GRAVY score according to Kyte and Doolittle (1982)
`aromaticity`(→ float)	Calculate the aromaticity score according to Lobry (1994).

CodonU.analyzer.internal_comp.is_not_bad_seq(seq: Bio.Seq.Seq | str, code: int, _type: str) → bool

Checks if the sequence is bad i.e. length of the sequence is not divisible by 3

Parameters:

seq – The nucleotide sequence
code – The code to call BadSequenceError (1 or 2)
_type – Type of sequence, i.e. ‘nuc’

Returns:

True if seq is not bad

Raises:

BadSequenceError – If the seq is bad

CodonU.analyzer.internal_comp.not_contains_amb_letter(seq: Bio.Seq.Seq | str) → bool

Checks if provided sequence contains ambiguous DNA letters

Parameters:: seq – Provided sequence
Returns:: True if sequence does not contain ambiguous letter
Raises:: NucleotideError – If sequence contain ambiguous letter

CodonU.analyzer.internal_comp.g3(seq: Bio.Seq.Seq | str) → float

Calculates percentage of G content for third position

Parameters:: seq – Provided sequence
Returns:: Percentage of G content

CodonU.analyzer.internal_comp.a3(seq: Bio.Seq.Seq | str) → float

Calculates percentage of A content for third position

Parameters:: seq – Provided sequence
Returns:: Percentage of A content

CodonU.analyzer.internal_comp.gc_123(seq: Bio.Seq.Seq | str) → tuple[float, float | int, float | int, float | int]

Calculate G+C content: total, for first, second and third positions

Parameters:: seq – Provided sequence
Returns:: The G+C percentage for the entire sequence, and the three codon positions

CodonU.analyzer.internal_comp.at_123(seq: Bio.Seq.Seq | str) → tuple[float, float | int, float | int, float | int]

Calculate G+C content: total, for first, second and third positions

Parameters:: seq – Provided sequence
Returns:: The A+T percentage for the entire sequence, and the three codon positions

CodonU.analyzer.internal_comp.custom_codon_table(name: str, alt_name: str | None, genetic_code_id: int, forward_table: dict[str, str], start_codons: list[str], stop_codons: list[str]) → None

Registers a new Codon Table as provided by the user.

Note: The scope of the newly registered table is limited to the working file only

Parameters:

name – Name for the table
alt_name – Short name for the table
genetic_code_id – Genetic code number for the table
forward_table – A dict containing mapping of codons to proteins [excluding stop codons]
start_codons – A list of possible start codons
stop_codons – A list of possible stop codons

Raises:

CodonTableExistsError – If the name, alt_name or genetic_code_id already exists

CodonU.analyzer.internal_comp.filter_reference(records, min_len_threshold: int, _type: str) → list[Bio.SeqRecord.SeqRecord]

Filters the list of reference based on given threshold of length

Parameters:

records – A generator object holding the sequence objects
min_len_threshold – Minimum length of nucleotide sequence to be considered as gene
_type – Type of sequence, i.e. ‘nuc’ or ‘aa

Returns:

The list of usable sequences

CodonU.analyzer.internal_comp.reverse_table(codon_table: Bio.Data.CodonTable.NCBICodonTableDNA) → dict[str, list[str]]

Creates the protein, codon dictionary where protein is key

e.g. ‘L’: [‘TTA’, ‘TTG’, ‘CTT’, ‘CTC’, ‘CTA’, ‘CTG’]

Parameters:: codon_table – The codon table
Returns:: The dict having protein as key

CodonU.analyzer.internal_comp.syn_codons(codon_table: Bio.Data.CodonTable.NCBICodonTableDNA) → dict[str, list[str]]

Creates the codon, synonymous codon family dictionary where codon is the key

e.g. ‘TTA’: [‘TTA’, ‘TTG’, ‘CTT’, ‘CTC’, ‘CTA’, ‘CTG’]

Parameters:: codon_table – The codon table
Returns:: The dict having individual codons as keys

CodonU.analyzer.internal_comp.sf_vals(codon_table: Bio.Data.CodonTable.NCBICodonTableDNA) → dict[int, list[str]]

Creates the sf value and protein dictionary where sf value is key

e.g. 6: [‘L’, ‘S’, ‘R’]

Parameters:: codon_table – The codon table
Returns:: The dict having sf values as key

CodonU.analyzer.internal_comp.rscu(references: list[Bio.Seq.Seq | str], genetic_code: int) → dict[str, float]

Calculates relative synonymous codon usage (RSCU) value for a given nucleotide sequence according to Sharp and Li (1987)

Parameters:

references – List of reference nucleotide sequences
genetic_code – Genetic table number for codon table

Returns:

A dictionary containing codons and their respective RSCU values

CodonU.analyzer.internal_comp.weights_for_cai(references: list[Bio.Seq.Seq | str], genetic_code: int) → dict[str, float]

Calculates relative adaptiveness/weight value for a given nucleotide sequence according to Sharp and Li (1987)

Parameters:

references – List of reference nucleotide sequences
genetic_code – Genetic table number for codon table

Returns:

A dictionary containing codons and their respective weights

CodonU.analyzer.internal_comp.cai(nuc_seq: Bio.Seq.Seq | str, references: list[Bio.Seq.Seq | str], genetic_code: int) → float

Calculates Codon Adaptive Index (CAI) value for a given nucleotide sequence according to Sharp and Li (1987)

Parameters:

nuc_seq – The Nucleotide Sequence
references – List of reference nucleotide sequences
genetic_code – Genetic table number for codon table

Returns:

The CAI value for given sequence

CodonU.analyzer.internal_comp.cbi(prot_seq: Bio.Seq.Seq | str, references: list[Bio.Seq.Seq | str], genetic_code: int) → tuple[float, str]

Calculates codon bias index (CBI) for a given protein seq based on Bennetzen and Hall (1982)

Parameters:

prot_seq – The Protein Sequence
references – List of reference nucleotide sequences
genetic_code – Genetic table number for codon table

Returns:

A tuple of CBI val and the optimal codon

Raises:

NoSynonymousCodonWarning – When there is no synonymous codons
MissingCodonWarning – When no codons translate to provided Amino acid

CodonU.analyzer.internal_comp.enc(references: list[Bio.Seq.Seq | str], genetic_code: int) → float

Calculates Effective number of codons (Enc) based on Wright (1989) and Fuglsang (2004)

Parameters:

references – List of reference nucleotide sequences
genetic_code – Genetic table number for codon table

Returns:

Calculated Enc value for the sequence(s)

Raises:

MissingCodonWarning – If there is no codon for a certain amino acid
NoProteinError – If there is no codon for a certain set of amino acid

CodonU.analyzer.internal_comp.gravy(seq: Bio.Seq.Seq | str) → float

Computes the GRAVY score according to Kyte and Doolittle (1982)

Parameters:: seq – Protein sequence
Returns:: The GRAVY score

CodonU.analyzer.internal_comp.aromaticity(seq: Bio.Seq.Seq | str) → float

Calculate the aromaticity score according to Lobry (1994).

Parameters:: seq – Protein sequence
Returns:: The aromaticity score

CodonU.analyzer.internal_comp

Module Contents

Functions

`CodonU.analyzer.internal_comp`