CodonU.analyzer
Submodules
Package Contents
Functions
|
Calculates cai values for each codon |
|
Calculates rscu values for each codon |
|
Calculates cbi values for each amino acid based on Bennetzen and Hall (1982) |
|
Calculates ENc value for a given sequences |
|
Calculates the gravy score for a given protein sequence according to Kyte and Doolittle (1982) |
|
Calculates the aromaticity score for a given protein sequence according to Lobry (1994) |
|
Retrieves the anticodon table from given link |
|
Calculates the gtAI value for each gene according to Anwar et al., 2023 |
|
Checks if the sequence is bad i.e. length of the sequence is not divisible by 3 |
|
Checks if provided sequence contains ambiguous DNA letters |
|
Calculates percentage of G content for third position |
|
Calculates percentage of A content for third position |
|
Calculate G+C content: total, for first, second and third positions |
|
Calculate G+C content: total, for first, second and third positions |
|
Registers a new Codon Table as provided by the user. |
|
Filters the list of reference based on given threshold of length |
|
Creates the protein, codon dictionary where protein is key |
|
Creates the codon, synonymous codon family dictionary where codon is the key |
|
Creates the sf value and protein dictionary where sf value is key |
|
Calculates relative synonymous codon usage (RSCU) value for a given nucleotide sequence according to Sharp and Li (1987) |
|
Calculates relative adaptiveness/weight value for a given nucleotide sequence according to Sharp and Li (1987) |
|
Calculates Codon Adaptive Index (CAI) value for a given nucleotide sequence according to Sharp and Li (1987) |
|
Calculates codon bias index (CBI) for a given protein seq based on Bennetzen and Hall (1982) |
|
Calculates Effective number of codons (Enc) based on Wright (1989) and Fuglsang (2004) |
|
Computes the GRAVY score according to Kyte and Doolittle (1982) |
|
Calculate the aromaticity score according to Lobry (1994). |
- CodonU.analyzer.calculate_cai(handle: str, genetic_code_num: int, min_len_threshold: int = 200, gene_analysis: bool = False, save_file: bool = False, file_name: str = 'CAI_report', folder_path: str = 'Report') dict[str, float | dict[str, float]]
Calculates cai values for each codon
- Parameters:
handle – Handle to the file, or the filename as a string
genetic_code_num – Genetic table number for codon table
min_len_threshold – Minimum length of nucleotide sequence to be considered as gene
gene_analysis – Option if gene analysis (True) or genome analysis (False) (optional)
save_file – Option for saving the values in xlsx format (Optional)
file_name – Intended file name (Optional)
folder_path – Folder path where image should be saved (optional)
- Returns:
The dictionary containing codon and cai value pairs if gene_analysis is False, otherwise returns the
dictionary containing gene name and corresponding codon and cai value pairs
- CodonU.analyzer.calculate_rscu(handle: str, genetic_code_num: int, min_len_threshold: int = 200, gene_analysis: bool = False, save_file: bool = False, file_name: str = 'RSCU_report', folder_path: str = 'Report') dict[str, float | dict[str, float]]
Calculates rscu values for each codon
- Parameters:
handle – Handle to the file, or the filename as a string
genetic_code_num – Genetic table number for codon table
min_len_threshold – Minimum length of nucleotide sequence to be considered as gene
gene_analysis – Option if gene analysis (True) or genome analysis (False) (optional)
save_file – Option for saving the values in xlsx format (Optional)
file_name – Intended file name (Optional)
folder_path – Folder path where image should be saved (optional)
- Returns:
The dictionary containing codon and rscu value pairs if gene_analysis is false, otherwise the dictionary containing the gene name and the codon & rscu value pairs
- CodonU.analyzer.calculate_cbi(handle: str, genetic_code_num: int, min_len_threshold: int = 66, gene_analysis: bool = False, save_file: bool = False, file_name: str = 'CBI_report', folder_path: str = 'Report') dict[str, tuple[float, str] | dict[str, tuple[float, str]]]
Calculates cbi values for each amino acid based on Bennetzen and Hall (1982)
- Parameters:
handle – Handle to the file, or the filename as a string
genetic_code_num – Genetic table number for codon table
min_len_threshold – Minimum length of nucleotide sequence to be considered as gene
gene_analysis – Option if gene analysis (True) or genome analysis (False) (optional)
save_file – Option for saving the values in xlsx format (Optional)
file_name – Intended file name (Optional)
folder_path – Folder path where image should be saved (optional)
- Returns:
The dictionary containing amino acid and cbi value, optimal codon pairs if gene_analysis is false,
otherwise returns the dictionary containing gene name and dictionary containing amino acid and cbi value, optimal codon pairs
- CodonU.analyzer.calculate_enc(handle: str, genetic_code_num: int, min_len_threshold=200, gene_analysis: bool = False, save_file: bool = False, file_name: str = 'ENc_report', folder_path: str = 'Report') float or dict[str, float]
Calculates ENc value for a given sequences
- Parameters:
handle – Handle to the file, or the filename as a string
genetic_code_num – Genetic table number for codon table
min_len_threshold – Minimum length of nucleotide sequence to be considered as gene
gene_analysis – Option if gene analysis (True) or genome analysis (False) (optional)
save_file – Option for saving the values in xlsx format (Optional)
file_name – Intended file name (Optional)
folder_path – Folder path where image should be saved (optional)
- Returns:
The ENc value if gene_analysis is false, else a dictionary containing gene number and corresponding ENc value
- CodonU.analyzer.calculate_gravy(handle: str, min_len_threshold: int = 66, gene_analysis: bool = False, save_file: bool = False, file_name: str = 'GRAVY_report', folder_path: str = 'Report') dict[str, float] | float
Calculates the gravy score for a given protein sequence according to Kyte and Doolittle (1982)
- Parameters:
handle – Handle to the file, or the filename as a string
min_len_threshold – Minimum length of protein sequence to be considered as gene
gene_analysis – Option if gene analysis (True) or genome analysis (False) (optional)
save_file – Option for saving the values in xlsx format (Optional)
file_name – Intended file name (Optional)
folder_path – Folder path where image should be saved (optional)
- Returns:
The GRAVY score of given sequence if gene_analysis is false, else the dictionary containing gene number and corresponding GRAVY score
- CodonU.analyzer.calculate_aromaticity(handle: str, min_len_threshold: int = 66, gene_analysis: bool = False, save_file: bool = False, file_name: str = 'Aroma_report', folder_path: str = 'Report') dict[str, float] | float
Calculates the aromaticity score for a given protein sequence according to Lobry (1994)
- Parameters:
handle – Handle to the file, or the filename as a string
min_len_threshold – Minimum length of protein sequence to be considered as gene
gene_analysis – Option if gene analysis (True) or genome analysis (False) (optional)
save_file – Option for saving the values in xlsx format (Optional)
file_name – Intended file name (Optional)
folder_path – Folder path where image should be saved (optional)
- Returns:
The aromaticity score of given sequence if gene_analysis is false, else the dictionary containing
gene number and corresponding GRAVY score
- CodonU.analyzer.get_anticodon_count_dict(url: str, database: str) dict[str, int]
Retrieves the anticodon table from given link
- NOTE: The database can have only two values, i.e. “tRNADB_CE” and “GtRNAdb”
For using tRNADB_CE, please visit http://trna.ie.niigata-u.ac.jp/cgi-bin/trnadb/index.cgi
For using GtRNAdb, please visit http://gtrnadb.ucsc.edu/
- Parameters:
url – URL to anticodon table
database – Type of database from the above options
- Returns:
The dictionary containing anticodon as key and count as val
- Raises:
UnsupportedDatabase – If database has other values than mentioned
- CodonU.analyzer.calculate_gtai(handle: str, anticodon_dict: dict, genetic_code_num: int, reference: str | None = None, size_pop: int = 60, generation_num: int = 100, save_file: bool = False, file_name: str = 'tAI_report', folder_path: str = 'Report') tuple[pandas.DataFrame, pandas.DataFrame, pandas.DataFrame]
Calculates the gtAI value for each gene according to Anwar et al., 2023
- The function returns following dataframes:
tai_df: The dataframe contains gene description and tAI values
abs_wi_df: The dataframe contains each anticodon and absolute weights according to the paper
rel_wi_df: The dataframe contains each anticodon and relative weights according to the paper
Note: The function will generate a file named ‘best_fit.py’
- param handle:
Path to the fasta file as a string
- param anticodon_dict:
The dictionary containing anticodon as key and count as value
- param genetic_code_num:
Genetic table number for codon table
- param reference:
Path to the reference fasta file as a string (Optional)
- param size_pop:
A parameter for the genetic algorithm to identify the population size (Optional)
- param generation_num:
A parameter for the genetic algorithm to identify the generation number (Optional)
- param save_file:
Option for saving the values in xlsx format (Optional)
- param file_name:
Intended file name (Optional)
- param folder_path:
Folder path where image should be saved (optional)
- return:
A tuple of 3 dataframes, as discussed earlier
- raises FileExistsError:
If re-write permission is not given for the file best_fit.py
- raises ImportError:
If best_fit.py is not created or deleted after creation
- CodonU.analyzer.generate_report(handle: str, _type: str, genetic_code_num: int, min_len_threshold: int, res_folder_path: str = 'Report')
Generate the report for given sequence [best for gene analysis]
- For nucleotide sequence, this generates reports of:
RSCU
CAI
CBI
ENc
- For protein sequence, this generates reports of:
GRAVY score
Aromaticity score
- NOTE Possible types are
nuc: For nucleotide sequence
aa: For protein sequence
- Parameters:
handle – Handle to the file, or the filename as a string
_type – Type of the sequence [nuc or aa]
genetic_code_num – Genetic table number for codon table
min_len_threshold – Minimum length of sequence to be considered as gene
res_folder_path – The path of folder where the file will be saved
- Returns:
- exception CodonU.analyzer.NoSynonymousCodonWarning(aa)
Bases:
CodonU.cua_warnings.codon_usage_warns.CodonUsageWarningOccurs when only one codon in the given reference sequence list translates to a certain amino acid
- warn()
- exception CodonU.analyzer.MissingCodonWarning(aa: str)
Bases:
CodonU.cua_warnings.codon_usage_warns.CodonUsageWarningOccurs when no codon in the given reference sequence list translates to a certain amino acid
- warn()
- exception CodonU.analyzer.NoProteinError(seq)
Bases:
CodonU.cua_errors.codon_usage_err.CodonUsageErrorOccurs when a complete category of amino acid based on sf values is not translated by the provided sequence
- exception CodonU.analyzer.CodonTableExistsError(code, val)
Bases:
CodonU.cua_errors.codon_usage_err.CodonUsageErrorOccurs when id, name or alt_name of a new table is same with existing tables
- exception CodonU.analyzer.BadSequenceError(seq, code)
Bases:
CodonU.cua_errors.codon_usage_err.CodonUsageErrorOccurs when the sequence is bad i.e. length of the sequence is not divisible by 3
- exception CodonU.analyzer.NucleotideError(code)
Bases:
CodonU.cua_errors.codon_usage_err.CodonUsageErrorOccurs when an ambiguous or invalid nucleotide is present in genome
- CodonU.analyzer.is_not_bad_seq(seq: Bio.Seq.Seq | str, code: int, _type: str) bool
Checks if the sequence is bad i.e. length of the sequence is not divisible by 3
- Parameters:
seq – The nucleotide sequence
code – The code to call BadSequenceError (1 or 2)
_type – Type of sequence, i.e. ‘nuc’
- Returns:
True if seq is not bad
- Raises:
BadSequenceError – If the seq is bad
- CodonU.analyzer.not_contains_amb_letter(seq: Bio.Seq.Seq | str) bool
Checks if provided sequence contains ambiguous DNA letters
- Parameters:
seq – Provided sequence
- Returns:
True if sequence does not contain ambiguous letter
- Raises:
NucleotideError – If sequence contain ambiguous letter
- CodonU.analyzer.g3(seq: Bio.Seq.Seq | str) float
Calculates percentage of G content for third position
- Parameters:
seq – Provided sequence
- Returns:
Percentage of G content
- CodonU.analyzer.a3(seq: Bio.Seq.Seq | str) float
Calculates percentage of A content for third position
- Parameters:
seq – Provided sequence
- Returns:
Percentage of A content
- CodonU.analyzer.gc_123(seq: Bio.Seq.Seq | str) tuple[float, float | int, float | int, float | int]
Calculate G+C content: total, for first, second and third positions
- Parameters:
seq – Provided sequence
- Returns:
The G+C percentage for the entire sequence, and the three codon positions
- CodonU.analyzer.at_123(seq: Bio.Seq.Seq | str) tuple[float, float | int, float | int, float | int]
Calculate G+C content: total, for first, second and third positions
- Parameters:
seq – Provided sequence
- Returns:
The A+T percentage for the entire sequence, and the three codon positions
- CodonU.analyzer.custom_codon_table(name: str, alt_name: str | None, genetic_code_id: int, forward_table: dict[str, str], start_codons: list[str], stop_codons: list[str]) None
Registers a new Codon Table as provided by the user.
Note: The scope of the newly registered table is limited to the working file only
- Parameters:
name – Name for the table
alt_name – Short name for the table
genetic_code_id – Genetic code number for the table
forward_table – A dict containing mapping of codons to proteins [excluding stop codons]
start_codons – A list of possible start codons
stop_codons – A list of possible stop codons
- Raises:
CodonTableExistsError – If the name, alt_name or genetic_code_id already exists
- CodonU.analyzer.filter_reference(records, min_len_threshold: int, _type: str) list[Bio.SeqRecord.SeqRecord]
Filters the list of reference based on given threshold of length
- Parameters:
records – A generator object holding the sequence objects
min_len_threshold – Minimum length of nucleotide sequence to be considered as gene
_type – Type of sequence, i.e. ‘nuc’ or ‘aa
- Returns:
The list of usable sequences
- CodonU.analyzer.reverse_table(codon_table: Bio.Data.CodonTable.NCBICodonTableDNA) dict[str, list[str]]
Creates the protein, codon dictionary where protein is key
e.g. ‘L’: [‘TTA’, ‘TTG’, ‘CTT’, ‘CTC’, ‘CTA’, ‘CTG’]
- Parameters:
codon_table – The codon table
- Returns:
The dict having protein as key
- CodonU.analyzer.syn_codons(codon_table: Bio.Data.CodonTable.NCBICodonTableDNA) dict[str, list[str]]
Creates the codon, synonymous codon family dictionary where codon is the key
e.g. ‘TTA’: [‘TTA’, ‘TTG’, ‘CTT’, ‘CTC’, ‘CTA’, ‘CTG’]
- Parameters:
codon_table – The codon table
- Returns:
The dict having individual codons as keys
- CodonU.analyzer.sf_vals(codon_table: Bio.Data.CodonTable.NCBICodonTableDNA) dict[int, list[str]]
Creates the sf value and protein dictionary where sf value is key
e.g. 6: [‘L’, ‘S’, ‘R’]
- Parameters:
codon_table – The codon table
- Returns:
The dict having sf values as key
- CodonU.analyzer.rscu(references: list[Bio.Seq.Seq | str], genetic_code: int) dict[str, float]
Calculates relative synonymous codon usage (RSCU) value for a given nucleotide sequence according to Sharp and Li (1987)
- Parameters:
references – List of reference nucleotide sequences
genetic_code – Genetic table number for codon table
- Returns:
A dictionary containing codons and their respective RSCU values
- CodonU.analyzer.weights_for_cai(references: list[Bio.Seq.Seq | str], genetic_code: int) dict[str, float]
Calculates relative adaptiveness/weight value for a given nucleotide sequence according to Sharp and Li (1987)
- Parameters:
references – List of reference nucleotide sequences
genetic_code – Genetic table number for codon table
- Returns:
A dictionary containing codons and their respective weights
- CodonU.analyzer.cai(nuc_seq: Bio.Seq.Seq | str, references: list[Bio.Seq.Seq | str], genetic_code: int) float
Calculates Codon Adaptive Index (CAI) value for a given nucleotide sequence according to Sharp and Li (1987)
- Parameters:
nuc_seq – The Nucleotide Sequence
references – List of reference nucleotide sequences
genetic_code – Genetic table number for codon table
- Returns:
The CAI value for given sequence
- CodonU.analyzer.cbi(prot_seq: Bio.Seq.Seq | str, references: list[Bio.Seq.Seq | str], genetic_code: int) tuple[float, str]
Calculates codon bias index (CBI) for a given protein seq based on Bennetzen and Hall (1982)
- Parameters:
prot_seq – The Protein Sequence
references – List of reference nucleotide sequences
genetic_code – Genetic table number for codon table
- Returns:
A tuple of CBI val and the optimal codon
- Raises:
NoSynonymousCodonWarning – When there is no synonymous codons
MissingCodonWarning – When no codons translate to provided Amino acid
- CodonU.analyzer.enc(references: list[Bio.Seq.Seq | str], genetic_code: int) float
Calculates Effective number of codons (Enc) based on Wright (1989) and Fuglsang (2004)
- Parameters:
references – List of reference nucleotide sequences
genetic_code – Genetic table number for codon table
- Returns:
Calculated Enc value for the sequence(s)
- Raises:
MissingCodonWarning – If there is no codon for a certain amino acid
NoProteinError – If there is no codon for a certain set of amino acid
- CodonU.analyzer.gravy(seq: Bio.Seq.Seq | str) float
Computes the GRAVY score according to Kyte and Doolittle (1982)
- Parameters:
seq – Protein sequence
- Returns:
The GRAVY score
- CodonU.analyzer.aromaticity(seq: Bio.Seq.Seq | str) float
Calculate the aromaticity score according to Lobry (1994).
- Parameters:
seq – Protein sequence
- Returns:
The aromaticity score