avatar Bite 298. Fasta to 2-Line Fasta

A very simple format to store biological sequence data is the (multi-)FASTA format.

The first line of each record starts with a > character and is followed by a name. The following lines contain the sequence information. A record ends when > character or the end of the file is encountered.

FASTA files downloaded from public databases such as the National Center for Biotechnology Information (NCBI) often contain line breaks after 60-80 characters which ensures sequences are not truncated in text editors.

However in many cases (think *nix command line tools, grep, wc, etc.), it is better if each sequence is exactly one line long.

Your job is to convert a multiline FASTA file to a 2-Line FASTA file.

Multiline FASTA format:

>Sequence 1:
ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGA
AATTGCTCAAGAAAAATTATCAGCTGTAAGTTACT
[...]
>Sequence 2:
ATGATGGAATTCACTATTAAAAGAGATTATTTTAT
TACACAATTAAATGACACATTAAAAGCTATTTCAC
[...]

2-Line FASTA format:

>Sequence 1:
ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGAAATTGCTCAAGAAAAATTATCAGCTGTAAGTTACT[...]
>Sequence 2:
ATGATGGAATTCACTATTAAAAGAGATTATTTTATTACACAATTAAATGACACATTAAAAGCTATTTCAC[...]

This Bite has biopython enabled (check out module Bio.SeqIO's convert function), but it can also be solved without this module.

Login and get coding
go back Intermediate level
Bitecoin 3X

47 out of 49 users completed this Bite.
Will you be Pythonista #48 to crack this Bite?
Resolution time: ~49 min. (avg. submissions of 5-240 min.)
Pythonistas rate this Bite 4.5 on a 1-10 difficulty scale.
» Up for a challenge? 💪

Focus on this Bite hiding sidebars, turn on Focus Mode.

Ask for Help