avatar Bite 321. Magic bytes

Magic numbers are bytes that can be used to uniquely identify certain file formats. Portable Network Graphics (PNG) files, for example, usually start with the byte sequence 89 50 4E 47 0D 0A 1A 0A. This can be useful to determine file types when file extensions are missing or wrong.

In this Bite you will write a function that identifies file types using a supplied magic numbers table. Arbitrary single bytes in the magic byte sequence are indicated by two question marks (??) as in the case of the JPG format. If the file type cannot be determined the function should raise an FileNotRecognizedException.

>>> determine_filetype_by_magic_bytes("/tmp/image.png")
'Image encoded in the Portable Network Graphics format'
>>> determine_filetype_by_magic_bytes("/tmp/image.txt")
Traceback (most recent call last):
...
magic_bytes.FileNotRecognizedException: /tmp/image.txt: File format not recognized

Note: Make sure to strip out the comments in parenthesis from byte sequences in the magic table (e.g. JPG format).

Login and get coding
go back Advanced level
Bitecoin 4X

23 out of 25 users completed this Bite.
Will you be the 24th person to crack this Bite?
Resolution time: ~122 min. (avg. submissions of 5-240 min.)
Our community rates this Bite 9.0 on a 1-10 difficulty scale.
» Up for a challenge? 💪

Focus on this Bite hiding sidebars, turn on Focus Mode.

Ask for Help