(because was taken)

Back to Blog

Breaking Beatles: (Ab)using an Emulator to Defeat DRM

I've spent way too much time in the past six or so years trying to figure out how to defeat one game's encryption, and now I've finally done it.

First, a bit of history. The Beatles: Rock Band came out almost 11 years ago, on 09/09/09. It was a massive deal: rhythm gaming was just about at its peak, and the Beatles were the biggest name in music. Naturally, Harmonix went all-out in the development of this title, producing custom Beatles-inspired plastic instruments for the game, designing incredible animated visuals, adding vocal harmonies, and even including... a new DRM system?

Yes, one of the widely-reported aspects of this game was Harmonix's licensing of Cloakware DRM from Irdeto to protect the game.

You see, even at the time of the first Rock Band title, people had discovered that the games contain multitrack audio files, which contain the individual instrument and vocal parts on separate audio tracks. This is how the game manages to mute the guitars or drums whenever you miss a note, but outside the game they are of interest to DJs, mashup and remix producers, and music super-fans. (If you've never seen these before, try out my online multitrack mixer to get a better idea of why this kind of content may be interesting outside of games).

Unlike music fans, record labels are not generally excited about the public having access to what are essentially the master recordings to songs. So starting from the first Rock Band game, AES encryption in CTR mode was utilized to protect the “.mogg” audio files.

The Mogg Format

To store the multitrack audio for the songs in the game, Harmonix developed a format known as MOGG. Basically, it's standard Ogg Vorbis with some headers tacked on the beginning. These additional headers contain an index (called an OggMap) so that the game can quickly seek to any point within a song, which helps avoid delays when seeking to specific sections in practice mode or scrolling the song backwards a bit after un-pausing the game, for example.

Graphical depiction of the text in this section
An artist's impression of a MOGG file.

The first byte in a mogg file is the version identifier. A version of 10 indicates an un-encrypted file, meaning it only contains an OggMap followed by a normal Ogg Vorbis stream. Any version greater than 10 indicates that the ogg stream is encrypted. Although the method of encryption is always AES-CTR, and the nonce or IV is always stored in plain-text after the OggMap, the method to derive the key would change with each new version number.

The first version, 11, used in Rock Band and even in Guitar Hero 2 for the Xbox, had a fixed key that was stored unprotected in the game binary. This was quickly reverse-engineered by folks like xorloser who were developing modding tools for the games. In response, with the release of Rock Band 2, the mogg version was bumped up to 12 and now a large number of keys were hidden in the game binary, with a very convoluted algorithm used to derive the proper key for a given file. However, even this complex system was also reverse-engineered by xorloser in ArkTool version 7. As a response, Harmonix again bumped the version all the way to 14 with a title update for Rock Band and Rock Band 2, which basically doubled the complexity of the key-derivation step. Since then, there have not been any public releases of encryption tools for mogg files.


Although we now know that version 14 remained un-cracked for many years, Harmonix probably thought it was only a matter of time before version 14 (and the closely related version 15) were cracked, and they knew that the Beatles were even more protective of their masters than most other artists. So, they spent big bucks to license Cloakware from Irdeto for The Beatles: Rock Band for the next mogg encryption format. This new version 16 was still using AES-CTR, but with a twist: rather than a normal AES implementation, Cloakware uses what's known as white-box cryptography. This is still technically AES, but the algorithm is obscured by the use of a bunch of pre-computed tables of numbers which make it very difficult to ascertain the actual AES key. The idea is that even with a debugger, the user can't recover the AES key without doing a lot of hard work and math.

It's true, the Cloakware code is a hideous mess of obfuscated control-flow and dense PPC code including the famous "rotate left word immediate then and with mask" instruction. Even with advanced modern decompilers like Ghidra, which admittedly do a great job in making the code readable, still struggle to unravel all the automatic transformations used to obfuscate the code.

Ghidra decompiler output for a whitebox AES function demonstrating control flow obfuscation
Look at this shit. Even with symbols from a debug build left on the Wii copy of the game, it's still an unreadable mess, and this is one of the shortest obfuscated functions in the binary.

The astute among you may be wondering how making the AES part more complicated would improve protection, when plaintext data is still going to be fed into a Vorbis codec at some point. To this end, the Cloakware DRM with all its control-flow obfuscation and data protection is integrated within the Ogg Vorbis implementation. It protects data structures as they are passed around to the various Ogg Vorbis APIs, and even obfuscates the decoded PCM output while it's on its way to the audio driver!

Ghidra decompiler output for the obfuscated ogg_sync_pageseek function
This is the ogg_sync_pageseek function after being transformed with Cloakware. Compare to the original function. Those struct member names were added by me, but even with the names, can you tell what's going on here?

This is a pretty serious piece of DRM software that actually does a great job protecting the Ogg Vorbis data. However, Harmonix did not use it in any of their subsequent games. Maybe the license was just too expensive to justify the value, or maybe they figured out that even advanced encryption and obfuscation is not foolproof...

Side Channels

It's worth noting at this point that in spite of the beefy encryption and code obfuscation, the multitracks from this game leaked pretty shortly after release. The reason for this is the analog hole (or digital hole if you happen to use S/PDIF): the music can't stay encrypted as it travels to your speakers and into your earholes, so it's always going to be possible to record the audio output to another medium at this point in the chain. All it takes to do this for multitrack audio is to hard-pan all but one channel to one side, and record the remaining channel in mono; then combine and sync up the channels afterward. This method, along with others similar to it (like using phase-inversion), is the source of 99.9% of Rock Band multitrack leaks since 2009.

Surprisingly, Harmonix has not done anything to stop this method even on the latest Rock Band 4, as tracks are still getting recorded and distributed online. (BTW, please contact me if you want to know how to stop this, I have some ideas...)

New Tools

As I mentioned previously, I had been looking into this on-and-off for more than six years, but I was never able to make much headway because I was limited to reading PPC assembly, which for the Cloakware-protected functions was just a non-starter. I couldn't grok what function was doing what or what data was going where. This changed last March when the NSA released Ghidra. It's a complete open source software suite with an interactive disassembler and decompiler. This makes reading through code so much easier, as it is transformed into a C-like language, and you can rename and add types to variables to make the code easier to understand. Ghidra allowed me to quickly go through and make some educated guesses about what functions are doing, which allowed me to identify things like the Ogg Vorbis API functions that had been transformed with obfuscation. If you're going to load Xbox 360 executables into Ghidra, I strongly recommend installing Warranty Voider's XEX loader plugin.

The second tool that has helped immensely is Xenia, an Xbox 360 emulator. This is an incredible piece of software written by a bunch of people who are way smarter than me, and it runs The Beatles: Rock Band at almost full speed on my PC (well, it gets as far as the menus, but that's all we really need). And as a bonus, it has a debugger built in so I could inspect the CPU and memory state while the game is running (well, sometimes, when it decides not to hang or crash). To make debugging easier, I made a Ghidra script that exports all the function names to a .map file, which Xenia will load when you set the --load_module_map command-line argument, to show function names on the call stack.

Doing Questionable Things

I still didn't think I could figure out how to get the AES keys and bypass Cloakware entirely, but I was convinced that there must be some point in the code where there's decrypted Ogg Vorbis data available. Well, after searching and searching, and lots of debugging, I discovered that the ogg_page structure from the cloakware-protected version of ogg_sync_pageout() is unencrypted! This function finds a "page" of data in an ogg file and returns it as a structure so that eventually the individual vorbis packets can be fed to the audio codec. The "page" is the largest sub-unit of an ogg file; an ogg really is just a sequence of page structures one after another. So, if you can dump all the pages in order, you get back the original file.

At first I thought maybe I could mod the game binary to call NtOpenFile() and NtWriteFile() to copy the data out right when this function returned. This probably would have worked, but I decided that would be too difficult. So instead, I decided to mod Xenia itself (and also the game binary). Modding Xenia has a number of advantages: I can iterate quickly and not worry about rebuilding a XEX or messing around writing functions in PPC assembly. Instead I can just write the dumping code in Xenia and figure out a way to jump to it while the game is running.

Fortunately, Xenia is very neatly written and I could do everything I wanted in a relatively small amount of code. Here is a link to the patch I made that does the magic. What I do is add logic to detect the invalid PPC instruction sc 19. When this instruction is detected, a comment with the string "SC19" gets inserted into the IR (intermediate representation) of the code. Then when the x64 backend finds this comment, it emits a native jump to my function which I called breakReadPacket(). This function writes all the ogg pages to an ogg file in the current directory. I then patch the default.xex binary to turn the instruction right after the call to ogg_sync_pageout() into an sc 19 instruction. So, while the game plays a mogg, it calls my code and the decrypted mogg just appears on my hard drive. Pretty neat, huh!?

All that remained at this point was figuring out how to get the game to decode an entire ogg from just the menus. I'm sure there's a way to write a fancy binary patch that creates and polls a VorbisReader object before anything's even drawn to the screen, but I wanted a quick and dirty solution. So I just modified the game's story chapters to include the full setlist, and modified the game's song preview times to be the full song length. Then I could just scroll through the song list of the first chapter and wait for the dumps to appear. This process is still pretty slow, around 30 kilobytes per second, but since I don't know how much time it would take me to figure out how to speed it up, I'm okay with that.

Parting Thoughts

Some may say that this does not constitute breaking Cloakware Whitebox DRM and I both agree and disagree with that conclusion. It's true, I did not find a way to compute AES keys for the encrypted content. But, I did completely bypass the protections that the software was supposed to provide, and managed to use the game as a decryption oracle. So make of that what you will. Others might say that this was all pointless because of the analog hole stuff I mentioned earlier. I would disagree with this; the analog hole recordings are always going to be flawed because a) they are resampled to 48kHz with a not-so-great resampler, and truncated to 16 bits with dithering, and b) you don't get small, first-generation encodings this way, you are stuck with either a larger lossless format or a smaller but now second-generation lossy format.

In all, I think this was an interesting and fun activity that helped me to gain better understanding of things like reverse-engineering, static and dyanmic binary analysis, JIT construction and implementations, audio codecs, cryptography systems, and more. I'm very happy that one of my pet projects for the past few years is finally completed.

Sorry, there is no way to comment directly on my blog posts.
But feel free to contact me if you have questions or comments!