avatar Bite 292. Scoring matrices

Proteins fulfill important functions in all organisms and consist of amino acids linked together in a specific order.

Although single changes of these amino acids can have devastating effects on the protein function, not all changes carry the same severity. The impact is largely influenced by the chemical and physical properties of the substituted amino acid.

But how can differences in proteins be quantified? An efficient way to calculate similarity is the use of scoring matrices.

In this Bite you will use BLOSUM and PAM matrices to calculate protein similarity scores.

The score is calculated by summing up all the values from the scoring matrix for each paired amino acid. Each amino acid is represented by a different letter (e.g. A stands for alanine, R for aRginine ...)

Consider the following scoring matrix:

BLOSUM62 (excerpt): 
  |  A  R   N   D   C   Q  [...]
--+-----------------------
A |  4 -1  -2  -2   0  -1  [...]
R | -1  5   0  -2  -3   1  [...]
[...]

To calculate the BLOSUM62 similarity score between two sequences Seq1 and Seq2, the sequences are aligned and the individual scores for each amino acid are added up as follows:

Seq1     A  R  R  N  C  Q  A
Seq2     A  A  R  R  A  A  A
------ ----------------------
Score    4 -1  5  0  0 -1  4  --> SUM == 11

matrix_score() therefore returns 11 in this example.</p>

Objectives:

- Implement a general matrix score calculator for amino acid sequences using the provided matrices

- Write a function that returns the sequence(s) of the most closely related (highest score) amino acid sequence</li>

Note: For this bite you can assume that all sequences are already properly aligned.

 

Login and get coding
go back Advanced level
Bitecoin 4X

12 out of 13 users completed this Bite.
Will you be Pythonista #13 to crack this Bite?
Resolution time: ~97 min. (avg. submissions of 5-240 min.)
Pythonistas rate this Bite 8.33 on a 1-10 difficulty scale.
» Up for a challenge? 💪

We use Python 3.8