Data Encoding
Let's take a look at a less technical subject here : data encoding ⚙️
Understanding how data representation for computers is crucial for working effectively with different types of data, including text, numbers, images, and others.
Data representation is the process of converting data from one form/format to another. The most common forms of data representation are binary, hexadecimal, and ASCII.
Standard data format
Binary
At the most fundamental level, all data in computers is represented in binary, using only 0s and 1s. This is the fundamental way that computers store and process data. Each binary digit (or bit) can represent two states (on/off, true/false). Then each digit in a binary number represents a power of 2, starting from the rightmost digit which represents \(2^0\), then going left to \(2^1\), \(2^2\), and so on. For example, the binary number 1011 represents \(1.2^3 + 0 x 2^2 + 1 x 2^1 + 1 x 2^0\), or 11 in decimal form.
Hexadecimal
Hexadecimal (also known as hex) is a base-16 numeral system. This system is particularly useful in computer science and programming because it provides a more human-friendly representation of binary-coded values.
One hexadecimal digit represents four binary digits (bits), which is half a byte. This makes hex a convenient way to express binary data, as it's much shorter and more readable.
The hexadecimal system uses sixteen distinct symbols. The first ten are the same as the decimal system (0-9) to represent values zero to nine, and then it uses the letters A-F to represent values ten to fifteen.
For example, the binary sequence 1111
1010
can be more compactly represented as FA
in hex.
#convert decimal to hexadecimal
decimal = 254
hexadecimal = hex(decimal)
print(f"Decimal {decimal} in hexadecimal is {hexadecimal}")
Colors in web development are often represented in hexadecimal format, with three pairs of hexadecimal digits representing the red, green, and blue (RGB) components.
#convert an RGB color to hexadecimal
def rgb_to_hex(r, g, b):
return f'#{r:02x}{g:02x}{b:02x}'
#convert RGB to hex
rgb_color = (255, 165, 0) # Orange color
hex_color = rgb_to_hex(*rgb_color)
print(f"RGB {rgb_color} in Hex is {hex_color}")
More details on this good article Comment convertir du binaire en hexadécimal
Text Encoding
ASCII
The good old guy, or American Standard Code for Information Interchange (ASCII) uses 7 bits to represent characters, allowing for 128 unique symbols (0-127), covering English letters, digits, and some control characters.
For example, the letter A
is represented by the binary code 01000001
, while the number 1
is represented by the binary code 00110001
. ASCII codes are widely used in computer systems for text encoding and transmission.
Let's play with ASCII and python here :
#code to display ASCII values for 'A' to 'Z'
for char in range(65, 91):
print(f"{chr(char)}: {char}")
For the full list of ASCII characters, see an ASCII Conversion Chart
Unicode
Unicode is a comprehensive computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. Developed in conjunction with the Universal Character Set (UCS) and published in the Unicode Standard, its goal is to replace existing character encoding schemes with its universal set of characters, enabling text and symbols from all of the writing systems of the world to be consistently represented and manipulated by computers.
UTF-8 is a more modern and versatile unicode encoding system designed to encompass virtually all characters used in human languages. It's a variable-width encoding that can use one to four bytes for each character, making it backward compatible with ASCII.
#encode and decode a string with UTF-8
original_string = "Hello, 🌍!"
encoded_string = original_string.encode('utf-8')
print(f"Encoded: {encoded_string}")
decoded_string = encoded_string.decode('utf-8')
print(f"Decoded: {decoded_string}")
Encoding Images and Binary Data
Images and other binary data are stored in formats that specify how to interpret the binary data. For example, image files (like PNG or JPEG) contain headers and metadata that describe the image dimensions, color depth, and pixel data.
We will see this into the computer vision part of this course do not worry 🤓
Wrap it up
In summary, binary is a system of representing data using only two digits, hexadecimal is a system of representing data using 16 digits, and ASCII is a system of representing characters using numerical codes. If you want to dig more about data encoding representation here some articles :
- Binary and hexadecimal numbers explained for developers
- Basic course of data cncoding
- Binary VS Hex
- Decoding the confusing world of encodings
- A Guide to Go's
encoding
Package: Base64, Hex, and Binary - Comparison of encoding schemes
- Common things and mistakes about encoding
- IBM ASCII, decimal, hexadecimal, octal, and binary conversion table