avatar Bite 255. Codon Usage

The genetic code of all organisms uses a 3 base (codon), 4 letter encoding (A, G, C or T/U) to represent the 20* amino acids used in proteins. This yields 43 = 4*4*4 = 64 different possible three base codons. Of these, one is used as an initiator called "start codon", three are used to signal the end of a protein and are called "stop codons" (*). The residual 60 codons + the start/methionine codon encode the 20 proteinogenic amino acids. Some amino acids are encoded by up to 6 different codons, whereas other amino acids are only encoded by a single codon. This is known as the degenerate code and is often visualized by a codon wheel. Every organism has a different set of preferred codons which helps to optimize and balance protein production.

In this bite you are provided with a list of all coding sequences of the bacterium Staphyloccocus aureus.

Calculate the average codon usage table for all sequences using the supplied translation table. Please note that the coding sequences are supplied a an RNA sequence, whereas the codon usage table is provided as a DNA sequence. To convert a DNA sequence to an RNA sequence, replace all Ts to Us. Disregard sequences that are not valid coding sequences.


There you go, our first Bioinformatics Bite. Keep calm and code in Python!

Login and get coding
go back Intermediate level
Bitecoin 3X

29 out of 37 users completed this Bite.
Will you be Pythonista #30 to crack this Bite?
Resolution time: ~91 min. (avg. submissions of 5-240 min.)
Pythonistas rate this Bite 7.12 on a 1-10 difficulty scale.
» Up for a challenge? 💪

We use Python 3.8