007.pdf

(142 KB) Pobierz
The Misuse of RC4 in Microsoft Word and Excel
Hongjun Wu
Institute for Infocomm Research, Singapore
hongjun@i2r.a-star.edu.sg
Abstract.
In this report, we point out a serious security flaw in Mi-
crosoft Word and Excel. The stream cipher RC4 [9] with key length up
to 128 bits is used in Microsoft Word and Excel to protect the documents.
But when an encrypted document gets modified and saved, the initial-
ization vector remains the same and thus the same keystream generated
from RC4 is applied to encrypt the different versions of that document.
The consequence is disastrous since a lot of information of the document
could be recovered easily.
1
Introduction
After more than two decades of public research on cryptography, many practi-
cally secure ciphers have been proposed. If we use those ciphers properly, ade-
quate protection could be achieved. Unfortunately, when the ciphers are imple-
mented in products, various security problems may arise. A well-known story
is related to an old version of the Netscape browser. In the implementation of
the Secure Socket Layer (SSL) in Netscape 1.1, the key of the symmetric key
cipher is derived from the current time and the process ID (or the system time).
The key space becomes severely limited, and even the 128-bit encryption version
could be easily cracked [4].
For the implementation of stream ciphers, the basic principle is that if the
same key is used for more than once, different initialization vectors should be
used to prevent the same keystream from being used to encrypt more than one
message. When the stream cipher is used in the data transmission, normally
people would follow this principle strictly. However, in the environment where
the document needs to be edited and modified, such principle may be forgot-
ten. This kind of mistake takes place in the Microsoft Office (Word and Excel)
encryption – the same key and the same initialization vector are allowed to en-
crypt different versions of a document. This happens as follows. We encrypt a
Microsoft Office (Word or Excel) document with a password and save that file.
Later that document is modified and being saved again. In this process, the
key and initialization vector remain unchanged, so the same keystream is used
to protect two different versions (the original and the modified versions) of the
documents. By XORing those two versions, we could obtain a lot of information
about the document.
The above attack could take place in real life. Suppose that Alice and Bob
are working on the same Microsoft Office (Word or Excel) document. They share
the same password and use that password to protect the document. They would
make changes to the document and the document is encrypted and transmitted
between them for a number of times. In this process, the same password and
initialization vector are used to protect all the modified versions of that document
and the document could be easily recovered from those intercepted files with high
chance.
Here is another example. Suppose that Alice is working on a Microsoft Office
document (Word and Excel) and she uses a password to protect it. During the
process, Alice may need to backup her files. An attacker could retrieve a lot
of information from those backup files even though the attacker does not know
Alice’s password.
This report is organized as follows. The background information on the se-
curity of Microsoft Office is given in Section 2. We illustrate the misuse of RC4
in Microsoft Word and Excel in Section 3 and Section 4, respectively. Section 5
discusses the countermeasure and Sections 5 concludes this report.
2
Introduction to the Security of Microsoft Office
The Microsoft Office includes Word, Excel, PowerPoint and other components
[5]. We consider only the Word, Excel in this report. There are five versions of
Microsoft Office: 95, 97, 2000, XP and the latest Office 2003.
The encryption in Microsoft Office 95 is to XOR the cascaded password with
the message. From the cryptographic point of view, this encryption scheme does
not provide any security protection. The other versions of Microsoft Office use
RC4 to protect the documents. The early export version of Microsoft Office
supports only the 40-bit encryption. The 40-bit encryption scheme is vulnerable
to the brute force attack since several Giga instructions can now be processed
in one second on a personal computer. Both the XOR and the 40-bit encryption
fail to provide sufficient protection for the documents. There have been many
cracking software on Microsoft Office documents, but almost all of them aim at
these two types of weak encryptions.
With the 128-bit version being supported in Microsoft Office, one may expect
that sufficient protection could be achieved since a random 16-byte ASCII pass-
word contains more than 100-bit secret information and the brute force attack
fails. Unfortunately, the stream cipher RC4 is misused in Microsoft Office. It has
been well known that the initialization vector in a stream cipher should be used
properly. But Microsoft Office manages the initialization vector improperly and
the same keystream could be used to encrypt different versions of a document.
That is the flaw we will illustrate in detail in the rest of this report.
Remarks 1.
In Microsoft Office, using the same key with different initialization
vectors would not leak the secret key. The reason is that the secret key and the
initialization vector are hashed together and the hash output is used as secret
key in RC4. So the key schedule weakness of RC4 [3] has no effect on the security
of Microsoft Office.
2
Remarks 2.
To the best of our knowledge, the security flaw given in the rest
of this report has not been reported in public.
3
The Misuse of RC4 in Microsoft Word
In this section, we show that RC4 is implemented Microsoft Word in an insecure
way and the 128-bit RC4 fails to protect the document as expected. The flaw
is that the same initialization vector is used when the document is modified.
This flaw causes part of the documents being recovered with negligible amount
of computation.
3.1
Evidence that RC4 is misused in Microsoft Word
In this subsection, we use Microsoft Word 2002 to illustrate that RC4 is misused
in Microsoft Office.
We create a Word document which contains only one sentence “Anti-virus
researchers from Symantec yesterday spotted the first virus capable of infecting
64-bit Windows systems”.
Before the document is encrypted, we open the doc-
ument in binary format and obtain Fig. 1. Then we encrypt the file according
to Appendix A (we choose the ‘Microsoft Strong Cryptographic Provider’ that
supports the 128-bit RC4 encryption). Once encrypted, we obtain Fig. 2 and
we notice that the encrypted data looks random. Then we change the sentence
in the document to “Anti-virus
researchers at Symantec yesterday spotted the
first virus capable of infecting 64-bit Windows systems”,
i.e., the word ‘from’
is changed to ‘at’. After saving the changes, the binary format of the modified
document is shown in Fig. 3 (using different password or file names would result
in different content, but at the end of the experiment the same conclusion would
be reached).
Fig. 1.
Binary format of the original document (unencrypted)
The encrypted data starts from the address 0xa00. Comparing Fig. 2 with Fig.
3, we immediately notice that the same keystream has been used to encrypt the
original and the modified documents. The first 23 bytes (from the address 0xa00
to 0xa16) are exactly the same since the first 23 bytes of the two documents
are the same. The rest of the bytes (from the address 0xa17 to 0xa73) are
different because the plaintext changes. For example, the byte at address 0xa17
is ‘f’ (0x66) in the original document, but ‘a’ (0x61) in the modified document.
3
Fig. 2.
Binary format of the original document (encrypted)
Fig. 3.
Binary format of the modified document (encrypted)
Obviously the same keystream byte with value 0x02 is used at address 0xa17 in
the original and the modified documents.
3.2
The attack on the Microsoft Word
It is quite possible that the encrypted Microsoft Word documents would be
transmitted between the different users for checking, improvements and modifi-
cations. The misuse of RC4 in Microsoft Word is thus a serious threat to those
who trust the 128-bit encryption provided by Microsoft.
Once it becomes clear that RC4 is misused in Microsoft Word, the attack
is straightforward. It is quite easy to detect whether the same keystream has
been used for more than once. For example, if the document contains only the
ASCII characters, then the most significant bit of each plaintext byte remains 0
and we can simply use those bits for detection. Once we obtained two different
documents encrypted with the same keystream, a lot of information could be
retrieved. The detailed analysis on recovering the information from the XORed
result of two plaintexts is illustrated in [2].
4
The Misuse of RC4 in Microsoft Excel
In this section, we show that RC4 is implemented in Microsoft Excel in an
insecure way. The flaw is similar to that in Microsoft Word. We use Microsoft
Excel 2002 to illustrate the flaw.
4.1
Modifying the Microsoft Excel document
In this subsection, we investigate how the modification of an Excel document
would affect its binary format. For the binary format of that document, when
4
some cells get modified, the modified cells would be relocated after those un-
modified cells. The modification is thus different from that in Microsoft Word.
We create an Excel document that is shown in Fig. 4. When we are creating
this document, the mouse is applied to locate each cell from left to right, top
to bottom (i.e., following the order ‘1a’,‘1b’,‘1c’,‘1d’,‘2a’,·
· ·,‘3c’,‘3d’),
and there
is no error correction when typing those data. We save this document, and the
binary format of the saved file is shown in Fig. 5.
Fig. 4.
The original Excel document
Fig. 5.
Binary format of the original Excel document
Then we change the content ‘1c’ to ‘c1’, and save the file. The modified document
is shown in Fig. 6 and its binary format is shown in Fig. 7.
Fig. 6.
The modified Excel document
Fig. 7.
Binary format of the modified Excel document
Comparing Fig. 5 with Fig. 7, we notice that the data ‘c1’ in the modified cell
is relocated to the end of the data, and those unmodified data (‘1d’, ‘2a’, ‘2b’,
‘2c’, ‘2d’, ‘3a’, ‘3b’, ‘3c’, ‘3d’) proceeded by the modified cell are moved forward.
5
Zgłoś jeśli naruszono regulamin