Skip to main content

WhitePages

  • Description: I stopped using YellowPages and moved onto WhitePages... but the page they gave me is all blank!
  • Difficulty: Medium

🔎 Solution

For this challenge, we are given a .txt file. At first glance, the file looks completely empty when opened in a text editor.

However, inspecting the file in a hex viewer reveals that it is not truly empty.

> xxd whitepages.txt 
00000000: e280 83e2 8083 e280 83e2 8083 20e2 8083 ............ ...
00000010: 20e2 8083 e280 83e2 8083 e280 83e2 8083 ...............
00000020: 20e2 8083 e280 8320 e280 83e2 8083 e280 ...... ........
00000030: 83e2 8083 20e2 8083 e280 8320 e280 8320 .... ...... ...
00000040: 2020 e280 83e2 8083 e280 83e2 8083 e280 ..............
00000050: 8320 20e2 8083 20e2 8083 e280 8320 e280 . ... ...... ..
00000060: 8320 20e2 8083 e280 83e2 8083 2020 e280 . ......... ..
00000070: 8320 20e2 8083 2020 2020 e280 8320 e280 . ... ... ..
00000080: 83e2 8083 e280 83e2 8083 2020 e280 8320 .......... ...
00000090: e280 8320 e280 8320 e280 83e2 8083 e280 ... ... ........
000000a0: 8320 e280 83e2 8083 e280 8320 20e2 8083 . ......... ...
000000b0: e280 83e2 8083 e280 83e2 8083 20e2 8083 ............ ...

The content actually consists of a repeating whitespace-like pattern, with 2 main types of characters present:

  • E2 80 83: This sequence corresponds to the UTF-8 encoding of the EM SPACE (U+2003) character
  • 20: This is the standard space character

Given that only these 2 characters are present, it strongly suggests a form of binary encoding. The challenge likely defines the mapping in 1 of 2 possible ways:

  • Case 1: E2 80 830, and 201
  • Case 2: the reverse mapping, where E2 80 831, and 200

To solve this, we can write a simple Python script that:

  • Reads the file as raw bytes
  • Iterates through the content, replacing each character according to the chosen mapping
  • Concatenates the result into a binary string
  • Splits the binary string into 8-bit chunks and converts each chunk into its ASCII equivalent
def extract_ascii(filename):
with open(filename, "rb") as f:
data = f.read()
em_space = bytes.fromhex("E28083") # EM SPACE
space = bytes.fromhex("20") # SPACE

def decode(mapping):
bits = []
i = 0
while i < len(data):
if data[i:i+3] == em_space:
bits.append(mapping["em_space"])
i += 3
elif data[i:i+1] == space:
bits.append(mapping["space"])
i += 1
else:
i += 1
bitstring = "".join(bits)
chars = []
for j in range(0, len(bitstring), 8):
byte = bitstring[j:j+8]
if len(byte) == 8:
chars.append(chr(int(byte, 2)))
return "".join(chars)

# E28083 = 0, 20 = 1
case1 = decode({"em_space":"0", "space":"1"})
# E28083 = 1, 20 = 0
case2 = decode({"em_space":"1", "space":"0"})
return case1, case2

file_path = "whitepages.txt"
c1, c2 = extract_ascii(file_path)
print("Case 1:", c1)
print("Case 2:", c2)

Running the script with both mappings, we find that Case 1 yields a meaningful result, successfully recovering the original hidden content along with the flag.

> python script.py  
Case 1:
picoCTF

SEE PUBLIC RECORDS & BACKGROUND REPORT
5000 Forbes Ave, Pittsburgh, PA 15213
picoCTF{not_all_spaces_are_created_equal_7100860b0fa779a5bd8ce29f24f586dc}

Case 2: õöö¼«¹õõöö¬ººß¯ª½³¶¼ß­º¼°­»¬ßÙß½¾¼´¸­°ª±»ß­º¯°­«õööÊÏÏÏ߹߾Ó߯Ó߯¾ßÎÊÍÎÌõöö¼«¹     ÈÎÏÏÇÉÏÏÈÈÆÊÍÆÍËÊÇÉõöö

🚩Flag

picoCTF{not_all_spaces_are_created_equal_7100860b0fa779a5bd8ce29f24f586dc}