RSA Exposed: A Statistical Look at Public-Key Encryption

The RSA algorithm: overview

RSA (Rivest, Shamir, Adleman) bases its security on the difficulty of factoring very large numbers.

The main steps are:

Key generation:
- Choose two very large prime numbers, p and q (secret).
- Compute the modulus n = p · q (public).
- Compute phi(n) = (p-1)·(q-1) (secret).
- Choose a public exponent e (often 65537 or 3) coprime with phi(n).
- Compute the private exponent d such that (d · e) % phi(n) = 1.
Encryption: for a numeric message M compute C = M^e mod n.
Decryption: recover M = C^d mod n.

Exercise premise: letter-by-letter RSA

Normally RSA encrypts properly prepared blocks. In this exercise we apply the RSA formula to every single letter (or its ASCII value). For example, we encrypt 'a' (ASCII 97), then 'b' (98), and so on.

This effectively creates a homophonic substitution cipher: every 'a' will always map to the same number (97^e % n), every 'b' to another fixed number, etc. Statistically, the frequency distribution is preserved, which is a serious flaw: the most frequent plaintext letter will correspond to the most frequent ciphertext number.

From statistics to cryptanalysis: breaking letter-by-letter RSA

Because frequency is preserved, we can attack the cipher. Given a ciphertext (a long list of numbers) we can:

Count the frequency of each number in the ciphertext.
Count the frequency of each letter in a reference text corpus (English or Italian).
Create a decryption map: the most frequent ciphertext number likely corresponds to the most frequent letter in the reference text, the second to the second, and so on.

The interactive simulation below implements exactly this attack: it encrypts letter-by-letter and then uses frequency analysis (rank + hill-climb) to recover the text without knowing d.

RSA Simulation and Frequency Attack

This simulation encrypts text by applying the RSA formula to each symbol and then uses frequency analysis to infer the original text.

Enter the original text (longer text works better):

Prime p

Prime q

Public exponent e

No key generated

Plaintext (A–Z and spaces)

Mode Show as

Ciphertext / output

Use small alphabets and texts to keep the demo responsive.

Plaintext letter percentages

Ciphertext percentages (c mod 26)

Recovered candidates

#	Method	Score	Plaintext

No run yet.

Suggested experiments

Measure recovery success probability as a function of plaintext length (law of large numbers for frequencies).
Compare pure χ² ranking vs combined score (χ² + bigrams + wordScore).
Test sensitivity to alphabet choice (Italian vs English) and to the mapping c mod 26.
Observe how c mod 26 mixes frequencies but does not fully hide them when n is small.

References

R. L. Rivest, A. Shamir, L. Adleman, "A Method for Obtaining Digital Signatures and Public-Key Cryptosystems", 1978.
A. Menezes, P. van Oorschot, S. Vanstone, "Handbook of Applied Cryptography".
T. M. Cover, J. A. Thomas, "Elements of Information Theory".

Correct RSA implementation in C (example)

Example showing encryption/decryption on a number (practically a block), not per single letters.

#include <stdio.h>
#include <math.h>
typedef long long ll;

ll pow_mod(ll base, ll exp, ll mod) {
    ll res = 1;
    base %= mod;
    while (exp > 0) {
        if (exp % 2 == 1) res = (res * base) % mod;
        base = (base * base) % mod;
        exp /= 2;
    }
    return res;
}

ll mod_inverse(ll a, ll m) {
    a = a % m;
    for (ll x = 1; x < m; x++) {
        if ((a * x) % m == 1) {
            return x;
        }
    }
    return 1;
}

int main(void) {
    ll p = 53;
    ll q = 59;
    ll e = 3;

    ll n = p * q;
    ll phi_n = (p - 1) * (q - 1);
    
    ll d = mod_inverse(e, phi_n);

    printf("--- RSA Parameters ---\n");
    printf("p = %lld, q = %lld, e = %lld\n", p, q, e);
    printf("Public Key (n, e): (%lld, %lld)\n", n, e);
    printf("Phi(n) = %lld\n", phi_n);
    printf("Private Key (d): %lld\n", d);
    
    ll message = 888;

    printf("\n--- Process ---\n");
    printf("Original Message (M): %lld\n", message);

    ll encrypted = pow_mod(message, e, n);
    printf("Encrypted (C): %lld\n", encrypted);

    ll decrypted = pow_mod(encrypted, d, n);
    printf("Decrypted (M'): %lld\n", decrypted);

    if (message == decrypted) {
        printf("\nSuccess: The message was recovered!\n");
    } else {
        printf("\nError: Decryption failed.\n");
    }
    
    return 0;
}

As shown, a correct RSA implementation operates on numbers/blocks and not on individual letters. Its security comes from the difficulty of obtaining d given only n and e (factoring n into p and q).

The vulnerability exploited in the simulation is not an intrinsic weakness of RSA, but an implementation weakness: applying a deterministic per-letter cipher (where 'a' always becomes the same symbol) creates a substitution cipher vulnerable to statistical analysis.