WhitePages
- Description: I stopped using YellowPages and moved onto WhitePages... but the page they gave me is all blank!
- Difficulty: Medium
🔎 Solution
For this challenge, we are given a .txt file. At first glance, the file looks completely empty when opened in a text editor.
However, inspecting the file in a hex viewer reveals that it is not truly empty.
> xxd whitepages.txt
00000000: e280 83e2 8083 e280 83e2 8083 20e2 8083 ............ ...
00000010: 20e2 8083 e280 83e2 8083 e280 83e2 8083 ...............
00000020: 20e2 8083 e280 8320 e280 83e2 8083 e280 ...... ........
00000030: 83e2 8083 20e2 8083 e280 8320 e280 8320 .... ...... ...
00000040: 2020 e280 83e2 8083 e280 83e2 8083 e280 ..............
00000050: 8320 20e2 8083 20e2 8083 e280 8320 e280 . ... ...... ..
00000060: 8320 20e2 8083 e280 83e2 8083 2020 e280 . ......... ..
00000070: 8320 20e2 8083 2020 2020 e280 8320 e280 . ... ... ..
00000080: 83e2 8083 e280 83e2 8083 2020 e280 8320 .......... ...
00000090: e280 8320 e280 8320 e280 83e2 8083 e280 ... ... ........
000000a0: 8320 e280 83e2 8083 e280 8320 20e2 8083 . ......... ...
000000b0: e280 83e2 8083 e280 83e2 8083 20e2 8083 ............ ...
The content actually consists of a repeating whitespace-like pattern, with 2 main types of characters present:
E2 80 83
: This sequence corresponds to the UTF-8 encoding of the EM SPACE (U+2003) character20
: This is the standard space character
Given that only these 2 characters are present, it strongly suggests a form of binary encoding. The challenge likely defines the mapping in 1 of 2 possible ways:
- Case 1:
E2 80 83
→0
, and20
→1
- Case 2: the reverse mapping, where
E2 80 83
→1
, and20
→0
To solve this, we can write a simple Python script that:
- Reads the file as raw bytes
- Iterates through the content, replacing each character according to the chosen mapping
- Concatenates the result into a binary string
- Splits the binary string into 8-bit chunks and converts each chunk into its ASCII equivalent
def extract_ascii(filename):
with open(filename, "rb") as f:
data = f.read()
em_space = bytes.fromhex("E28083") # EM SPACE
space = bytes.fromhex("20") # SPACE
def decode(mapping):
bits = []
i = 0
while i < len(data):
if data[i:i+3] == em_space:
bits.append(mapping["em_space"])
i += 3
elif data[i:i+1] == space:
bits.append(mapping["space"])
i += 1
else:
i += 1
bitstring = "".join(bits)
chars = []
for j in range(0, len(bitstring), 8):
byte = bitstring[j:j+8]
if len(byte) == 8:
chars.append(chr(int(byte, 2)))
return "".join(chars)
# E28083 = 0, 20 = 1
case1 = decode({"em_space":"0", "space":"1"})
# E28083 = 1, 20 = 0
case2 = decode({"em_space":"1", "space":"0"})
return case1, case2
file_path = "whitepages.txt"
c1, c2 = extract_ascii(file_path)
print("Case 1:", c1)
print("Case 2:", c2)
Running the script with both mappings, we find that Case 1 yields a meaningful result, successfully recovering the original hidden content along with the flag.
> python script.py
Case 1:
picoCTF
SEE PUBLIC RECORDS & BACKGROUND REPORT
5000 Forbes Ave, Pittsburgh, PA 15213
picoCTF{not_all_spaces_are_created_equal_7100860b0fa779a5bd8ce29f24f586dc}
Case 2: õöö¼«¹õõöö¬ººß¯ª½³¶¼ßº¼°»¬ßÙß½¾¼´¸°ª±»ßº¯°«õööÊÏÏÏ߹߾Ó߯Ó߯¾ßÎÊÍÎÌõöö¼«¹ ÈÎÏÏÇÉÏÏÈÈÆÊÍÆÍËÊÇÉõöö
🚩Flag
picoCTF{not_all_spaces_are_created_equal_7100860b0fa779a5bd8ce29f24f586dc}