Introduction and general Tor concepts
References:
Other articles on the topic:
- Introduction and general Tor concepts (this one)
- Encryption keys & algorithms, directory servers and channels
Disclaimer:
While I did my best to be as accurate as possible, the topic is complex and I am really good at doing mistakes. If you notice anything wrong, please reach out. Moreover, I skipped some details to keep the content concise; so please, use the references above to deepen your knowledge.
Tor is the implementation of The Onion Routing protocol.
What follows is the first of a series of articles trying to explain how Tor works with only one goal: save you the hours I spent understanding all of this.
Disclaimer: I did my best to study and verify everything in details. But I do mistake and the topic is quite complex. If you find anything wrong, please let me know in the comments.
Is ChatGPT sleep deprived?
Sleep deprivation, if put to the extreme, leads inevitably to alucination. Hence, we can conclude that ChatGPT must be sleep deprived because most of the times I tried to ask it something about Tor, it get confused or provided plain wrong answers.
TL;DR: once again, don’t blindly trust GenAI whatsoever, especially when dealing with less known, deep technical topics.
Tor 101
Tor is an overlay network.
It is composed by thousands (~ 6-11k) relays, connected through channels that form circuits inside which cells are sent and received.
A relay is a host running an instance of the Tor daemon/service configured to make it a relay. If you install Tor on a host, that host does not automatically become a relay. It must be configured that way.
From a networking point of view, channels are nothing more than TLS sessions connecting hosts around the world. It is how these sessions are instantiated and what is communicated within that is different from a normal connection.
When a client access Tor, it selects a set of relays (three, by default) that will be connected as a chain, one after the other. The path from the client to the last relay is called a circuit.
Both the internal messages exchanged between the relays (and also with the client) and the actual traffic generated by the client is encapsulated inside fixed length messages, called cells.
Onions and layers
Tor, like onions, cakes and ogres, have layers. Plenty of layers.
Tor communications are wrapped inside layers of encryptions. Basically, it takes one piece of information and uses different keys and algorithms one after the other to allow only the correct recipient to be able to access that information.
Every moment needs it own encryption algorithm
Tor uses symmetric and asymmetric cryptography, hashing and key exchange algorithms (like Diffie-Hellman) to do its job, depending on what it is being done in the moment.
You know nothing about cryptography?
Cryptography is hard and complex, but don’t be intimidated. Luckily, smart women and mens in the last 100 years spent plenty of energy in engineering strong algorithms and correct ways to deal with most of the situations we daily face.
So, you don’t need to invent anything. Just understand how things work.
I am not good enough to explain these concepts here, but I suggest you my preferred technical book of all time:
“Serious Cryptography: A Practical Introduction to Modern Encryption”
written by Jean-Philippe Aumasson (ISBN-13: 978-1593278267).
Buy the book and study it in details. Believe me, it is worth the effort.
The current state of an ever changing network
Relays appear and disappear from the network, but for them to be useful they must be reachable and selectable by clients and other relays as well.
A relay generates a server descriptor containing all the information about the relay, like its IP address, some of its public keys, the flags it uses (e.g., if it can be used to reach Internet or not) and so on.
This document is then signed with the private identity key of the relay itself and sent to a set (5-10) ok semi-trusted server that are called directory servers.
Those directory servers keep receiving the service descriptor of the nodes. They then “vote” the state of the network, which means that they decide tor a new version of the directory, a “document” containing the last, agreed-upon-those-servers state of the whole network, i.e. the states of the relays composing the network.
Inside the official Tor Browser (and official Tor installers in general), a list of the directory servers is provided. It is possible for a user to change them, but this undermines the trust one can have in that client/relay.
This list is used by clients and relays to download the updated version of the directory, which in turn provides the most updated state of the whole network.
Whenever a client wants to access the Tor network, it selects a set of nodes (three by default) which act as entry/guard node, middle node and exit node.
A diamond is forever. An entry node lasts months.
The first node contacted by the client is called entry node or guard node.
As said, the client selects the nodes it wants to use in advance but, at the first run, it chooses three entry nodes it is going to use for the next few months.
It behaves this way to make some attacks more difficult to carry out.
Brefly, the idea is that if the same client keeps changing the entry nodes, sooner or later it will use some relays that are malicious. If the entry nodes are almost always the same, this condition should almost never occur.
Counterintuitive, uh?
Entry nodes are flagged as such due to their reliability and availability inside the network.
What is left: the middle node and the exit node.
Once the client establishes a channel with the guard node, it then uses that node to reach the second node, called middle node. This is achieved by extending the circuit, i.e. by instantiating a new channel between the first and the second node that can be used by the client to reach the latter.
The exit node is reached in the same way, by extending the circuit from the middle node.
The owner of the exit node can decide some exit policies, that are shared inside the directory and that allows the client to know when a specific relay is the right one to use. For example, exit policies can be used to define which protocol is allows to exit and which it not.
One important detail that is used by Tor to make correlation attacks harder to mount is the need for the guard node and the exit node to reside to a different /16 network.
Once the circuit is established, the client can then reach Internet and visit the services he/she/it wants to reach.
What is next?
The next article is focused on understanding some of the cryptography that Tor uses to achieve its goals. This will prove useful for the the future articles on the topic.
Leave a reply to TOR internals, for those of us who also have a life (2/n) – microlab.red Cancel reply