WhitePages

Description: I stopped using YellowPages and moved onto WhitePages... but the page they gave me is all blank!
Difficulty: Medium

🔎 Solution

For this challenge, we are given a .txt file. At first glance, the file looks completely empty when opened in a text editor.

However, inspecting the file in a hex viewer reveals that it is not truly empty.

> xxd whitepages.txt 
00000000: e280 83e2 8083 e280 83e2 8083 20e2 8083  ............ ...
00000010: 20e2 8083 e280 83e2 8083 e280 83e2 8083   ...............
00000020: 20e2 8083 e280 8320 e280 83e2 8083 e280   ...... ........
00000030: 83e2 8083 20e2 8083 e280 8320 e280 8320  .... ...... ... 
00000040: 2020 e280 83e2 8083 e280 83e2 8083 e280    ..............
00000050: 8320 20e2 8083 20e2 8083 e280 8320 e280  .  ... ...... ..
00000060: 8320 20e2 8083 e280 83e2 8083 2020 e280  .  .........  ..
00000070: 8320 20e2 8083 2020 2020 e280 8320 e280  .  ...    ... ..
00000080: 83e2 8083 e280 83e2 8083 2020 e280 8320  ..........  ... 
00000090: e280 8320 e280 8320 e280 83e2 8083 e280  ... ... ........
000000a0: 8320 e280 83e2 8083 e280 8320 20e2 8083  . .........  ...
000000b0: e280 83e2 8083 e280 83e2 8083 20e2 8083  ............ ...

The content actually consists of a repeating whitespace-like pattern, with 2 main types of characters present:

E2 80 83: This sequence corresponds to the UTF-8 encoding of the EM SPACE (U+2003) character
20: This is the standard space character

Given that only these 2 characters are present, it strongly suggests a form of binary encoding. The challenge likely defines the mapping in 1 of 2 possible ways:

Case 1: E2 80 83 → 0, and 20 → 1
Case 2: the reverse mapping, where E2 80 83 → 1, and 20 → 0

To solve this, we can write a simple Python script that:

Reads the file as raw bytes
Iterates through the content, replacing each character according to the chosen mapping
Concatenates the result into a binary string
Splits the binary string into 8-bit chunks and converts each chunk into its ASCII equivalent

def extract_ascii(filename):
    with open(filename, "rb") as f:
        data = f.read() 
    em_space = bytes.fromhex("E28083")  # EM SPACE
    space = bytes.fromhex("20")         # SPACE
    
    def decode(mapping):
        bits = []
        i = 0
        while i < len(data):
            if data[i:i+3] == em_space:  
                bits.append(mapping["em_space"])
                i += 3
            elif data[i:i+1] == space:   
                bits.append(mapping["space"])
                i += 1
            else:
                i += 1
        bitstring = "".join(bits)        
        chars = []
        for j in range(0, len(bitstring), 8):
            byte = bitstring[j:j+8]
            if len(byte) == 8:
                chars.append(chr(int(byte, 2)))
        return "".join(chars)
    
    # E28083 = 0, 20 = 1
    case1 = decode({"em_space":"0", "space":"1"})
    # E28083 = 1, 20 = 0
    case2 = decode({"em_space":"1", "space":"0"})  
    return case1, case2

file_path = "whitepages.txt"  
c1, c2 = extract_ascii(file_path)
print("Case 1:", c1)
print("Case 2:", c2)

Running the script with both mappings, we find that Case 1 yields a meaningful result, successfully recovering the original hidden content along with the flag.

> python script.py  
Case 1: 
                picoCTF

                SEE PUBLIC RECORDS & BACKGROUND REPORT
                5000 Forbes Ave, Pittsburgh, PA 15213
                picoCTF{not_all_spaces_are_created_equal_7100860b0fa779a5bd8ce29f24f586dc}

Case 2: õöö¼«¹õõöö¬ººß¯ª½³¶¼ß­º¼°­»¬ßÙß½¾¼´¸­°ª±»ß­º¯°­«õööÊÏÏÏß¹ß¾Óß¯Óß¯¾ßÎÊÍÎÌõöö¼«¹     ÈÎÏÏÇÉÏÏÈÈÆÊÍÆÍËÊÇÉõöö

🚩Flag

picoCTF{not_all_spaces_are_created_equal_7100860b0fa779a5bd8ce29f24f586dc}

🔎 Solution​

🚩Flag​

🔎 Solution

🚩Flag