PyBites Bite 298. Fasta to 2-Line Fasta

Bite 298. Fasta to 2-Line Fasta

A very simple format to store biological sequence data is the (multi-)FASTA format.

The first line of each record starts with a > character and is followed by a name. The following lines contain the sequence information. A record ends when > character or the end of the file is encountered.

FASTA files downloaded from public databases such as the National Center for Biotechnology Information (NCBI) often contain line breaks after 60-80 characters which ensures sequences are not truncated in text editors.

However in many cases (think *nix command line tools, grep, wc, etc.), it is better if each sequence is exactly one line long.

Your job is to convert a multiline FASTA file to a 2-Line FASTA file.

Multiline FASTA format:
>Sequence 1:
ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGA
AATTGCTCAAGAAAAATTATCAGCTGTAAGTTACT
[...]
>Sequence 2:
ATGATGGAATTCACTATTAAAAGAGATTATTTTAT
TACACAATTAAATGACACATTAAAAGCTATTTCAC
[...]
2-Line FASTA format:
>Sequence 1:
ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGAAATTGCTCAAGAAAAATTATCAGCTGTAAGTTACT[...]
>Sequence 2:
ATGATGGAATTCACTATTAAAAGAGATTATTTTATTACACAATTAAATGACACATTAAAAGCTATTTCAC[...]
This Bite has biopython enabled (check out module Bio.SeqIO's convert function), but it can also be solved without this module.

3.8 bioinformatics biopython +

Metrics »

47 out of 49 users completed this Bite.
Will you be Pythonista #48 to crack this Bite?
Resolution time: ~49 min. (avg. submissions of 5-240 min.)
Pythonistas rate this Bite 4.5 on a 1-10 difficulty scale.
» Up for a challenge? 💪

Focus on this Bite hiding sidebars, turn on Focus Mode.

Ask for Help

Hone Your Python Skills!

PyBites Platform

Bite 298. Fasta to 2-Line Fasta

Multiline FASTA format:

2-Line FASTA format: