maria
/
IT-Sec


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
							= Vigenere cipher
:stem: latexmath

== Task


The content of link:./ciphertext.txt[this file] is encrypted with a _Vigenere cipher_. (The key length is limited to 20 letters) +

Your task is to decrypt the content. Please note that you do not know the length of the key in this task!

Your solution will only be accepted if you find a better approach than simply trying all possible combinations for every possible key length  (w.r.t. brute force) paired with a _simple_ frequency analysis.

_A more intelligent appraoch will solve this task in roughtly quadratic time stem:[O=n^2], whereas trying all possible combinations, i.e. brute force, needs exponential time stem:[O=c^n \leftarrow c \gt 1]._


== Help

[NOTE]
====
The values used in this example are artifical and not the real values. For a very short text, the solution is not accurate enough!
====

. you need to guess the key length, thus you know it is not longer than 20 letters, try every possible key length stem:[2 \leq length \leq 20] 

. for every key length analyze the ciphertext statistically to derive the best plaintext

.. you have to crack the key two letters at a time +
if the length is 3

    VHEEMCRD (<- note that this is the plaintext for the example)
    ABCABCAB
    
you observe that the letters VH, EM and RD are encrypted with the first two letters of the key

.. cut the ciphertext to strings with the size of the key length. each column is then encoded with one letter +
example: +
Suppose the key is *ABC*. The _ciphertext_ *VIGENERE* should be cut into

    VIG
    ENE
    RE
    
    
. combining the first string with the second, the second with the third and so on. the last string should be combined with the first +
this gives us:

    01 01 01 (key_index)
    --------
    VI EN RE # letters enrypted with key[0] and key[1]

    12 12    (key_index)
    -----
    IG NE    # letters enrypted with key[1] and key[2]

    20 20    (key_index)
    -----
    GV EE    # letters enrypted with key[2] and key[0]
    
. for each sequence of two consecutive letters try all possible two letters keys (AA, AB, AC, ..., ZY, ZZ)

    VI EN RE decrypted with AA gives VI EN RE 
    VI EN RE decrypted with AB gives VH EM RD
    ...
    VI EN RE decrypted with ZZ gives WJ FO SF
    
. using the link:./english_bigram[Bigram Frequencies] (the frequencies are sorted alphabetically) give each key a score by multiplying the frequencies of the bigrams in the plaintext

     CIPHER  | KEY | PLAIN
    --------------------------
    VI EN RE | AB  | VH EM RD
    --------------------------
    score(AB) = frequency(VH) + frequency(EM) + frequency(RD)

[TIP]
--
* Normally, you would calculate the score by multiplication: `score(AB) = frequency(VH) * frequency(EM) * frequency(RD)`. However, with the given bigram frequencies, which are provided as logarithmic values, you can use an addition instead
* This means, the calculation using the numbers from the provided Bigram Frequencies is as follows: `score(AB) = frequency(VH) + frequency(EM) + frequency(RD)`
** The higher the calculated value, the better is the score for that particular bigram pair
--

. select for each string the key with the best score

     PLAIN   | KEY | K_index | SCORE
    -----------------------------------
    VH EM RD | AB  | 01      |151125528
    DA IY    | FG  | 12      |119303088
    EI CR    | CN  | 20      |122328401

. form the final key from these keys

    key[0] is either A or N, since score(AB) > score(CN), take A
    key[1] is either B or F, since score(AB) > score(FG), take B
    key[2] is either G or C, since score(CN) > score(FG), take C
    => the best key of length 3 is ABC

. do this process for key lengths 2 to 20

. using frequency analysis find the best fit of the 19 keys