CodonU.analyzer

Submodules

Package Contents

Functions

calculate_cai(→ dict[str, float | dict[str, float]])

Calculates cai values for each codon

calculate_rscu(→ dict[str, float | dict[str, float]])

Calculates rscu values for each codon

calculate_cbi(→ dict[str, tuple[float, ...)

Calculates cbi values for each amino acid based on Bennetzen and Hall (1982)

calculate_enc(→ float or dict[str, float])

Calculates ENc value for a given sequences

calculate_gravy(→ dict[str, float] | float)

Calculates the gravy score for a given protein sequence according to Kyte and Doolittle (1982)

calculate_aromaticity(→ dict[str, float] | float)

Calculates the aromaticity score for a given protein sequence according to Lobry (1994)

get_anticodon_count_dict(→ dict[str, int])

Retrieves the anticodon table from given link

calculate_gtai(→ tuple[pandas.DataFrame, ...)

Calculates the gtAI value for each gene according to Anwar et al., 2023

generate_report

is_not_bad_seq(→ bool)

Checks if the sequence is bad i.e. length of the sequence is not divisible by 3

not_contains_amb_letter(→ bool)

Checks if provided sequence contains ambiguous DNA letters

g3(→ float)

Calculates percentage of G content for third position

a3(→ float)

Calculates percentage of A content for third position

gc_123(→ tuple[float, float | int, float | int, ...)

Calculate G+C content: total, for first, second and third positions

at_123(→ tuple[float, float | int, float | int, ...)

Calculate G+C content: total, for first, second and third positions

custom_codon_table(→ None)

Registers a new Codon Table as provided by the user.

filter_reference(→ list[Bio.SeqRecord.SeqRecord])

Filters the list of reference based on given threshold of length

reverse_table(→ dict[str, list[str]])

Creates the protein, codon dictionary where protein is key

syn_codons(→ dict[str, list[str]])

Creates the codon, synonymous codon family dictionary where codon is the key

sf_vals(→ dict[int, list[str]])

Creates the sf value and protein dictionary where sf value is key

rscu(→ dict[str, float])

Calculates relative synonymous codon usage (RSCU) value for a given nucleotide sequence according to Sharp and Li (1987)

weights_for_cai(→ dict[str, float])

Calculates relative adaptiveness/weight value for a given nucleotide sequence according to Sharp and Li (1987)

cai(→ float)

Calculates Codon Adaptive Index (CAI) value for a given nucleotide sequence according to Sharp and Li (1987)

cbi(→ tuple[float, str])

Calculates codon bias index (CBI) for a given protein seq based on Bennetzen and Hall (1982)

enc(→ float)

Calculates Effective number of codons (Enc) based on Wright (1989) and Fuglsang (2004)

gravy(→ float)

Computes the GRAVY score according to Kyte and Doolittle (1982)

aromaticity(→ float)

Calculate the aromaticity score according to Lobry (1994).

CodonU.analyzer.calculate_cai(handle: str, genetic_code_num: int, min_len_threshold: int = 200, gene_analysis: bool = False, save_file: bool = False, file_name: str = 'CAI_report', folder_path: str = 'Report') dict[str, float | dict[str, float]]

Calculates cai values for each codon

Parameters:
  • handle – Handle to the file, or the filename as a string

  • genetic_code_num – Genetic table number for codon table

  • min_len_threshold – Minimum length of nucleotide sequence to be considered as gene

  • gene_analysis – Option if gene analysis (True) or genome analysis (False) (optional)

  • save_file – Option for saving the values in xlsx format (Optional)

  • file_name – Intended file name (Optional)

  • folder_path – Folder path where image should be saved (optional)

Returns:

The dictionary containing codon and cai value pairs if gene_analysis is False, otherwise returns the

dictionary containing gene name and corresponding codon and cai value pairs

CodonU.analyzer.calculate_rscu(handle: str, genetic_code_num: int, min_len_threshold: int = 200, gene_analysis: bool = False, save_file: bool = False, file_name: str = 'RSCU_report', folder_path: str = 'Report') dict[str, float | dict[str, float]]

Calculates rscu values for each codon

Parameters:
  • handle – Handle to the file, or the filename as a string

  • genetic_code_num – Genetic table number for codon table

  • min_len_threshold – Minimum length of nucleotide sequence to be considered as gene

  • gene_analysis – Option if gene analysis (True) or genome analysis (False) (optional)

  • save_file – Option for saving the values in xlsx format (Optional)

  • file_name – Intended file name (Optional)

  • folder_path – Folder path where image should be saved (optional)

Returns:

The dictionary containing codon and rscu value pairs if gene_analysis is false, otherwise the dictionary containing the gene name and the codon & rscu value pairs

CodonU.analyzer.calculate_cbi(handle: str, genetic_code_num: int, min_len_threshold: int = 66, gene_analysis: bool = False, save_file: bool = False, file_name: str = 'CBI_report', folder_path: str = 'Report') dict[str, tuple[float, str] | dict[str, tuple[float, str]]]

Calculates cbi values for each amino acid based on Bennetzen and Hall (1982)

Parameters:
  • handle – Handle to the file, or the filename as a string

  • genetic_code_num – Genetic table number for codon table

  • min_len_threshold – Minimum length of nucleotide sequence to be considered as gene

  • gene_analysis – Option if gene analysis (True) or genome analysis (False) (optional)

  • save_file – Option for saving the values in xlsx format (Optional)

  • file_name – Intended file name (Optional)

  • folder_path – Folder path where image should be saved (optional)

Returns:

The dictionary containing amino acid and cbi value, optimal codon pairs if gene_analysis is false,

otherwise returns the dictionary containing gene name and dictionary containing amino acid and cbi value, optimal codon pairs

CodonU.analyzer.calculate_enc(handle: str, genetic_code_num: int, min_len_threshold=200, gene_analysis: bool = False, save_file: bool = False, file_name: str = 'ENc_report', folder_path: str = 'Report') float or dict[str, float]

Calculates ENc value for a given sequences

Parameters:
  • handle – Handle to the file, or the filename as a string

  • genetic_code_num – Genetic table number for codon table

  • min_len_threshold – Minimum length of nucleotide sequence to be considered as gene

  • gene_analysis – Option if gene analysis (True) or genome analysis (False) (optional)

  • save_file – Option for saving the values in xlsx format (Optional)

  • file_name – Intended file name (Optional)

  • folder_path – Folder path where image should be saved (optional)

Returns:

The ENc value if gene_analysis is false, else a dictionary containing gene number and corresponding ENc value

CodonU.analyzer.calculate_gravy(handle: str, min_len_threshold: int = 66, gene_analysis: bool = False, save_file: bool = False, file_name: str = 'GRAVY_report', folder_path: str = 'Report') dict[str, float] | float

Calculates the gravy score for a given protein sequence according to Kyte and Doolittle (1982)

Parameters:
  • handle – Handle to the file, or the filename as a string

  • min_len_threshold – Minimum length of protein sequence to be considered as gene

  • gene_analysis – Option if gene analysis (True) or genome analysis (False) (optional)

  • save_file – Option for saving the values in xlsx format (Optional)

  • file_name – Intended file name (Optional)

  • folder_path – Folder path where image should be saved (optional)

Returns:

The GRAVY score of given sequence if gene_analysis is false, else the dictionary containing gene number and corresponding GRAVY score

CodonU.analyzer.calculate_aromaticity(handle: str, min_len_threshold: int = 66, gene_analysis: bool = False, save_file: bool = False, file_name: str = 'Aroma_report', folder_path: str = 'Report') dict[str, float] | float

Calculates the aromaticity score for a given protein sequence according to Lobry (1994)

Parameters:
  • handle – Handle to the file, or the filename as a string

  • min_len_threshold – Minimum length of protein sequence to be considered as gene

  • gene_analysis – Option if gene analysis (True) or genome analysis (False) (optional)

  • save_file – Option for saving the values in xlsx format (Optional)

  • file_name – Intended file name (Optional)

  • folder_path – Folder path where image should be saved (optional)

Returns:

The aromaticity score of given sequence if gene_analysis is false, else the dictionary containing

gene number and corresponding GRAVY score

CodonU.analyzer.get_anticodon_count_dict(url: str, database: str) dict[str, int]

Retrieves the anticodon table from given link

NOTE: The database can have only two values, i.e. “tRNADB_CE” and “GtRNAdb
Parameters:
  • url – URL to anticodon table

  • database – Type of database from the above options

Returns:

The dictionary containing anticodon as key and count as val

Raises:

UnsupportedDatabase – If database has other values than mentioned

CodonU.analyzer.calculate_gtai(handle: str, anticodon_dict: dict, genetic_code_num: int, reference: str | None = None, size_pop: int = 60, generation_num: int = 100, save_file: bool = False, file_name: str = 'tAI_report', folder_path: str = 'Report') tuple[pandas.DataFrame, pandas.DataFrame, pandas.DataFrame]

Calculates the gtAI value for each gene according to Anwar et al., 2023

The function returns following dataframes:
  • tai_df: The dataframe contains gene description and tAI values

  • abs_wi_df: The dataframe contains each anticodon and absolute weights according to the paper

  • rel_wi_df: The dataframe contains each anticodon and relative weights according to the paper

Note: The function will generate a file named ‘best_fit.py’

param handle:

Path to the fasta file as a string

param anticodon_dict:

The dictionary containing anticodon as key and count as value

param genetic_code_num:

Genetic table number for codon table

param reference:

Path to the reference fasta file as a string (Optional)

param size_pop:

A parameter for the genetic algorithm to identify the population size (Optional)

param generation_num:

A parameter for the genetic algorithm to identify the generation number (Optional)

param save_file:

Option for saving the values in xlsx format (Optional)

param file_name:

Intended file name (Optional)

param folder_path:

Folder path where image should be saved (optional)

return:

A tuple of 3 dataframes, as discussed earlier

raises FileExistsError:

If re-write permission is not given for the file best_fit.py

raises ImportError:

If best_fit.py is not created or deleted after creation

CodonU.analyzer.generate_report(handle: str, _type: str, genetic_code_num: int, min_len_threshold: int, res_folder_path: str = 'Report')

Generate the report for given sequence [best for gene analysis]

For nucleotide sequence, this generates reports of:
  • RSCU

  • CAI

  • CBI

  • ENc

For protein sequence, this generates reports of:
  • GRAVY score

  • Aromaticity score

NOTE Possible types are
  • nuc: For nucleotide sequence

  • aa: For protein sequence

Parameters:
  • handle – Handle to the file, or the filename as a string

  • _type – Type of the sequence [nuc or aa]

  • genetic_code_num – Genetic table number for codon table

  • min_len_threshold – Minimum length of sequence to be considered as gene

  • res_folder_path – The path of folder where the file will be saved

Returns:

exception CodonU.analyzer.NoSynonymousCodonWarning(aa)

Bases: CodonU.cua_warnings.codon_usage_warns.CodonUsageWarning

Occurs when only one codon in the given reference sequence list translates to a certain amino acid

warn()
exception CodonU.analyzer.MissingCodonWarning(aa: str)

Bases: CodonU.cua_warnings.codon_usage_warns.CodonUsageWarning

Occurs when no codon in the given reference sequence list translates to a certain amino acid

warn()
exception CodonU.analyzer.NoProteinError(seq)

Bases: CodonU.cua_errors.codon_usage_err.CodonUsageError

Occurs when a complete category of amino acid based on sf values is not translated by the provided sequence

exception CodonU.analyzer.CodonTableExistsError(code, val)

Bases: CodonU.cua_errors.codon_usage_err.CodonUsageError

Occurs when id, name or alt_name of a new table is same with existing tables

exception CodonU.analyzer.BadSequenceError(seq, code)

Bases: CodonU.cua_errors.codon_usage_err.CodonUsageError

Occurs when the sequence is bad i.e. length of the sequence is not divisible by 3

exception CodonU.analyzer.NucleotideError(code)

Bases: CodonU.cua_errors.codon_usage_err.CodonUsageError

Occurs when an ambiguous or invalid nucleotide is present in genome

CodonU.analyzer.is_not_bad_seq(seq: Bio.Seq.Seq | str, code: int, _type: str) bool

Checks if the sequence is bad i.e. length of the sequence is not divisible by 3

Parameters:
  • seq – The nucleotide sequence

  • code – The code to call BadSequenceError (1 or 2)

  • _type – Type of sequence, i.e. ‘nuc’

Returns:

True if seq is not bad

Raises:

BadSequenceError – If the seq is bad

CodonU.analyzer.not_contains_amb_letter(seq: Bio.Seq.Seq | str) bool

Checks if provided sequence contains ambiguous DNA letters

Parameters:

seq – Provided sequence

Returns:

True if sequence does not contain ambiguous letter

Raises:

NucleotideError – If sequence contain ambiguous letter

CodonU.analyzer.g3(seq: Bio.Seq.Seq | str) float

Calculates percentage of G content for third position

Parameters:

seq – Provided sequence

Returns:

Percentage of G content

CodonU.analyzer.a3(seq: Bio.Seq.Seq | str) float

Calculates percentage of A content for third position

Parameters:

seq – Provided sequence

Returns:

Percentage of A content

CodonU.analyzer.gc_123(seq: Bio.Seq.Seq | str) tuple[float, float | int, float | int, float | int]

Calculate G+C content: total, for first, second and third positions

Parameters:

seq – Provided sequence

Returns:

The G+C percentage for the entire sequence, and the three codon positions

CodonU.analyzer.at_123(seq: Bio.Seq.Seq | str) tuple[float, float | int, float | int, float | int]

Calculate G+C content: total, for first, second and third positions

Parameters:

seq – Provided sequence

Returns:

The A+T percentage for the entire sequence, and the three codon positions

CodonU.analyzer.custom_codon_table(name: str, alt_name: str | None, genetic_code_id: int, forward_table: dict[str, str], start_codons: list[str], stop_codons: list[str]) None

Registers a new Codon Table as provided by the user.

Note: The scope of the newly registered table is limited to the working file only

Parameters:
  • name – Name for the table

  • alt_name – Short name for the table

  • genetic_code_id – Genetic code number for the table

  • forward_table – A dict containing mapping of codons to proteins [excluding stop codons]

  • start_codons – A list of possible start codons

  • stop_codons – A list of possible stop codons

Raises:

CodonTableExistsError – If the name, alt_name or genetic_code_id already exists

CodonU.analyzer.filter_reference(records, min_len_threshold: int, _type: str) list[Bio.SeqRecord.SeqRecord]

Filters the list of reference based on given threshold of length

Parameters:
  • records – A generator object holding the sequence objects

  • min_len_threshold – Minimum length of nucleotide sequence to be considered as gene

  • _type – Type of sequence, i.e. ‘nuc’ or ‘aa

Returns:

The list of usable sequences

CodonU.analyzer.reverse_table(codon_table: Bio.Data.CodonTable.NCBICodonTableDNA) dict[str, list[str]]

Creates the protein, codon dictionary where protein is key

e.g. ‘L’: [‘TTA’, ‘TTG’, ‘CTT’, ‘CTC’, ‘CTA’, ‘CTG’]

Parameters:

codon_table – The codon table

Returns:

The dict having protein as key

CodonU.analyzer.syn_codons(codon_table: Bio.Data.CodonTable.NCBICodonTableDNA) dict[str, list[str]]

Creates the codon, synonymous codon family dictionary where codon is the key

e.g. ‘TTA’: [‘TTA’, ‘TTG’, ‘CTT’, ‘CTC’, ‘CTA’, ‘CTG’]

Parameters:

codon_table – The codon table

Returns:

The dict having individual codons as keys

CodonU.analyzer.sf_vals(codon_table: Bio.Data.CodonTable.NCBICodonTableDNA) dict[int, list[str]]

Creates the sf value and protein dictionary where sf value is key

e.g. 6: [‘L’, ‘S’, ‘R’]

Parameters:

codon_table – The codon table

Returns:

The dict having sf values as key

CodonU.analyzer.rscu(references: list[Bio.Seq.Seq | str], genetic_code: int) dict[str, float]

Calculates relative synonymous codon usage (RSCU) value for a given nucleotide sequence according to Sharp and Li (1987)

Parameters:
  • references – List of reference nucleotide sequences

  • genetic_code – Genetic table number for codon table

Returns:

A dictionary containing codons and their respective RSCU values

CodonU.analyzer.weights_for_cai(references: list[Bio.Seq.Seq | str], genetic_code: int) dict[str, float]

Calculates relative adaptiveness/weight value for a given nucleotide sequence according to Sharp and Li (1987)

Parameters:
  • references – List of reference nucleotide sequences

  • genetic_code – Genetic table number for codon table

Returns:

A dictionary containing codons and their respective weights

CodonU.analyzer.cai(nuc_seq: Bio.Seq.Seq | str, references: list[Bio.Seq.Seq | str], genetic_code: int) float

Calculates Codon Adaptive Index (CAI) value for a given nucleotide sequence according to Sharp and Li (1987)

Parameters:
  • nuc_seq – The Nucleotide Sequence

  • references – List of reference nucleotide sequences

  • genetic_code – Genetic table number for codon table

Returns:

The CAI value for given sequence

CodonU.analyzer.cbi(prot_seq: Bio.Seq.Seq | str, references: list[Bio.Seq.Seq | str], genetic_code: int) tuple[float, str]

Calculates codon bias index (CBI) for a given protein seq based on Bennetzen and Hall (1982)

Parameters:
  • prot_seq – The Protein Sequence

  • references – List of reference nucleotide sequences

  • genetic_code – Genetic table number for codon table

Returns:

A tuple of CBI val and the optimal codon

Raises:
CodonU.analyzer.enc(references: list[Bio.Seq.Seq | str], genetic_code: int) float

Calculates Effective number of codons (Enc) based on Wright (1989) and Fuglsang (2004)

Parameters:
  • references – List of reference nucleotide sequences

  • genetic_code – Genetic table number for codon table

Returns:

Calculated Enc value for the sequence(s)

Raises:
CodonU.analyzer.gravy(seq: Bio.Seq.Seq | str) float

Computes the GRAVY score according to Kyte and Doolittle (1982)

Parameters:

seq – Protein sequence

Returns:

The GRAVY score

CodonU.analyzer.aromaticity(seq: Bio.Seq.Seq | str) float

Calculate the aromaticity score according to Lobry (1994).

Parameters:

seq – Protein sequence

Returns:

The aromaticity score