The Investigation of Packets
Now that the Draft Investigatory Powers bill has been published I decided I’d see what the new rules would mean.
I focused on the requirement for communications providers to retain 12 months of connection records for each of their customers so they can be searched by the police as needed.
After reading the backing documents, it seemed to me that the bill basically requires the address and port data from each ‘connection’ – so I added a rule to my home/office firewall :
pass out quick log
This has the effect of capturing the first packet of each connection and putting it in a log.
Each day produces ~10Mbyte of raw data.
I used the unix tcpdump command to make it readable.
Here’s an example line
1446912274.959057 IP 126.96.36.199.ntp > time.apple.com.ntp: NTPv4, Client, length 48
That shows the time, the protocol, the source address and port, the desitnation address and port – and some extra info not required by the bill which I removed for the subsequent experiments.
Since each day contained ~ 100,000 lines I decided to focus on a single day – last Saturday.
I noticed that some of the destination addresses were in the raw numeric form even after a DNS reverse lookup
1446892838.318132 IP airport.westhawk.co.uk.34931 > 188.8.131.52.https:
So I wrote some code that looked them up in a commercial database (Maxmind) that matches numeric IPs with associated companies (Akamai in this case).
A real ISP would prefix that record with my account number – which seems to be my phone number in most of the cases I looked at.
The operational case gives 3 purposes for ICRs
- To assist in identifying who has sent a known communication online, which often involves a process referred to as internet protocol (IP) address resolution.
- To establish what services are being used by a known suspect or victim to communicate online, enabling further CD requests to be made to the providers of those online services e.g. to establish who the suspect or victim has been communicating with.
- To establish whether a suspect has accessed illegal services online e.g. to access illegal terrorist material or for the purposes of sharing indecent imagery of children.
The background to this is the fact that public IP addresses (eg 184.108.40.206 ) are reused and may be shared between multiple users over time or indeed at the same time (but with different port numbers). This is especially true of internet connections provided by mobile phone providers, who tend to funnel all their traffic through just a few IP addresses. Ironically the use of ISP based adult content filters also can mask the user’s IP address. The government needs a way to be able to say who was using a specific IPaddress to access a specific service at a particular time.
So does the ICR meet these requirements?
1) – Determining who sent a known message.
Here’s a header from an email I sent to Gmail:
Received: from [192.168.157.66] (unknown [220.127.116.11]) by zimbra003.verygoodemail.com (Postfix) with ESMTPSA id 8178D18A0DC0 for <firstname.lastname@example.org>; Sun, 8 Nov 2015 14:10:05 +0000 (GMT) From: tim panton <email@example.com>
Assuming the time is accurate, my ISP could look through all their ICRs and determine that I was using 18.104.22.168 at that time.
The reverse isn’t true however. If I send an email from Gmail to my work email, none of the IP addresses in the headers correspond to the address that vodafone allocated to my Android mobile.
So we can give ICRs 5/10 for that.
2) – Determining which services I use.
I decided to create some charts to help me answer this.
– Firstly an explanation: the biggest slice ‘domain’ is DNS (the mechanism used to map between names yopet.us and IP addresses 22.214.171.124)
This is the most common ‘connection’ – but because I run my own DNS server the number is roughly 3x what you’d expect from a ‘normal’ user, who would send queries straight to their ISP’s DNS server. So for most users, all these connections would go to the same place and tell you nothing interesting, unless you could see the content, which requires a warrant so is outside the scope of the ICR related powers.
The most interesting slice is small – ‘imaps’ That’s the one of the protocols used to retreive email – looking at the imaps record, we see:
1446896224.773849 IP 126.96.36.199.54299 > zimbra003.verygoodemail.com.imaps
Which tells you who my email supplier is – although only one of them – neither of the webmail services I use (yahoo and gmail) show up in this search.
HTTP and HTTPS make up the bulk of the rest of the data, so let’s look at that in more detail
- 1e100 is the name Google uses for all of its IP addresses – so think of it as ‘Google’
- Akamai is a CDN – they host content for other people, distributing it to their servers around the globe, ensuring that you access a copy near you, which speeds page load times – all of the big names use them or one of their competitors)
This tells you that I use google, twitter and facebook – which is hardly news – but does not show how I use them.
It does not show either of the messenger apps I use for work, Skype.com and Wire.com .
(indeed digging through the data, neither name crops up unless you inspect the content of the packets.)
So we’ll score that at 6/10 – since it found one of my email providers and gives a reason to go and request data from Google, Facebook and Twitter.
3) – To establish whether a suspect has accessed illegal services online
If we assume that a website has been allocated their own IP address has correctly registered it, then it would be possible to look in my ICR to see if I’d accessed it in the last 12 months. – So for example my employer Westhawk Ltd shows up in the list.
However, that isn’t the way the internet works these days. If I want to put a simple document on line, I’d put it on one of the many discussion sites, from Google docs to Linkedin and Tumblr, or Facebook – none of these give unique IP addresses for content, so the search would fail.
If I want to run an interactive site, I simply buy a cloud server with Amazon AWS , Microsoft’s Azure or Google’s compute cloud etc. These services allocate temporary IP addresses to virtual servers and recycle them as the service moves from machine to machine. These IPs are registered to Amazon (et al) not to the individual website, so again the search would fail.
If however the authorities have been keeping an eye on a site and have maintained a record of all the IPaddresses associated with a specific site name (e.g. yopet.us) it might be possible to cross correlate the ip addresses I accessed with the IPs that site had used.
I’ll give this a score of 4/10 – since with diligent detective work it may be possible to find something useful, but as criminals learn they will find it pretty easy to avoid showing up in these searches.
So overall – 5/10, mostly because the requirements seem to have been written before the advent of cloud computing and webmail.
Less legitimate usage
Ok, so that’s the authorized uses, but what happens if my ICR falls into the hands of a criminal, what could they learn from it?
Rather a lot actually.
Here’s a plot of http(s) connections every 10 mins:
Digging deeper into the record shows that I own Apple devices (see above) and at least one raspberry-pi. It also shows I have a ripe-atlas device on my network. More worryingly it pinponts a 3 way video call between me and my children, identifying the University my daughter is at and (slightly inaccurately) the town my son lives in.
It shows the financial institutions I use (bank and creditcard). Over time it would show the date I submit my VAT return, when I pay my gas bills which travel companies I use etc.
All useful facts if you are an identity thief, scammer, or just a plain burglar.
When I started this little project I had intended to put all the data online, but when I saw it, I decided the risk of identity theft was just too high to be worth making my data public.
So the trade off seems to be a 50% chance of finding something useful in a decreasing number of cases against the almost certainty that these databases will be hacked and used by thieves.
If these records must be kept, then they have to be kept very securely or the risk will outweigh the benefit. I must be informed when mine have been accessed or hacked so I can be extra vigilant against scams and break-ins.
P.S. After I finished this project I realized that a chunk of the data is missing, much of my IPv6 traffic goes out through a IPv6 over IPv4 tunnel to Hurricane Electric which bypasses the logging entirely as it would for an ISP’s ICR log. This also applies to VPNs and proxy services. So perhaps we need to reduce the score to 4/10 .