There are several tools available that will examine the word choice, phrases and grammatical structure of text and provide analysis of the writers/authors. I am fairly certain that the way we express information is fairly unique and that algorithms today are linking materials posted in various venues together and attributing them to particular individuals. This analysis is baby steps on comparison and just for fun; the tools are far from precise and the text sample length is far too short. I also suspect that it was purposefully modified to try and remove identifying signatures, but let’s see what we can find.
As many know, the Shadow Brokers released a set of information that included some text (with the bitcoin address removed):
How much you pay for enemies cyber weapons? Not malware you find in networks. Both sides, RAT + LP, full state sponsor tool set? We find cyber weapons made by creators of stuxnet, duqu, flame. Kaspersky calls Equation Group. We follow Equation Group traffic. We find Equation Group source range. We hack Equation Group. We find many many Equation Group cyber weapons. You see pictures. We give you some Equation Group files free, you see. This is good proof no? You enjoy!!! You break many things. You find many intrusions. You write many words. But not all, we are auction the best files.
We auction best files to highest bidder. Auction files better than stuxnet. Auction files better than free files we already give you. The party which sends most bitcoins to address: before bidding stops is winner, we tell how to decrypt. Very important!!! When you send bitcoin you add additional output to transaction. You add OP_Return output. In Op_Return output you put your (bidder) contact info. We suggest use bitmessage or I2P-bote email address. No other information will be disclosed by us publicly. Do not believe unsigned messages. We will contact winner with decryption instructions. Winner can do with files as they please, we not release files to public.
From looking at the text, the sentences are very short and lack proper grammar. (please excuse my own) Two times commas were used correctly in poorly worded sentences and twice it was used incorrectly to join two sentences. Many of the sentences are very short but linguistically, I am not sure this points to translation issues.
The first tool is a gender analyzer: http://www.hackerfactor.com/GenderGuesser.php
This tool reports that there are not enough words for a good read with only 215 out of the suggested minimum of 300 words. So a point to consider, was the amount of text purposefully kept short to limit exposure. The tool rated the text author as overwhelmingly male (85.9%) for informal writing but for formal, it would be female (75.21%).
The next analysis tool starts to dig much deeper and as a result ist more interesting. It is based on IBM’s Watson services, Tone Analyzer: https://tone-analyzer-demo.mybluemix.net/
Overall, the text rated highly on Anger (0.74), Extroversion (0.98) and Agreeableness (0.95) and interestingly, but probably not surprising very low on Openness (0.03).
The sentences with the highest Anger rating were:
0.47 How much you pay for enemies cyber weapons?
0.44 When you send bitcoin you add additional output to transaction.
0.38 Do not believe unsigned messages.
0.38 Winner can do with files as they please, we not release files to public.
0.36 You write many words.
0.34 No other information will be disclosed by us publicly.
0.33 We hack Equation Group.
0.32 We follow Equation Group traffic.
And those with the lowest:
0.11 You break many things.
0.10 But not all, we are auction the best files.
0.10 Auction files better than stuxnet.
0.07 We will contact winner with decryption instructions.
0.07 We auction best files to highest bidder.
0.00 You see pictures.
0.00 You enjoy!!!
0.00 Very important!!!
And the highest on Agreeableness:
0.99 We give you some Equation Group files free, you see.
0.99 We will contact winner with decryption instructions.
0.99 We hack Equation Group.
0.99 We follow Equation Group traffic.
0.98 Auction files better than free files we already give you.
0.98 We auction best files to highest bidder.
The next tools are from uClassify. https://uclassify.com
The first I tried is the Sentiment classifier and it gave it a slightly negative rating (56%) over positive (44%).
GenderAnalyzer_v5 gave the text a male (77%) classification.
Ageanalyzer struggled a bit, producing an age range of 65-100 with a 31% rating; next was a 24% rating for age range 26-35.
Mood was analyzed to be 95% Happy.
While there are a rather large number of classifiers available, this last one from uClassify, I found interesting, Value. It rated the text as Exploitive (44%) and Achieves (22%). The authors are taking advantage of what they found and bragging about it. While not surprising, it was interesting to see the tool pick up on it.
The last tool from Readability Score is one I like to use on my own works to help judge the readability and grade level of the writing. Given the short sentences, it is not too surprising that the level is approximately 6th grade and rated easy to read. https://readability-score.com/text/