You are being invited to take part in a research study. Before you decide to participate in this study, it is important that you understand why the research is being done and what it will involve. Please take the time to read the following information carefully. Please ask the researchers if there is anything that is not clear or if you need more information.
To read more about how we collect your data, scroll down using this link.
General questions about our research
What is the purpose of the IoT Inspector project?
Many people use smart-home devices, also known as the Internet-of-Things (IoT), in their daily lives, ranging from bulbs, plugs, and sensors, to TVs and kitchen appliances. To a large extent, these devices enrich the lives of many users. At the same time, they may bring negative impact to their owners.
Security and privacy risks. Many IoT devices are designed with poor security practices, such as using hard-coded passwords, lack of strong authentication, and not running updates. Devices may be hacked, and an attacker could potentially control the devices or steal sensitive information of the user.
Performance risks. A user may have a large number of IoT devices in his/her home. Together, these devices compete for limited bandwidth, which may degrade the overall performance of the home network.
Our goal is to measure and visualize these risks, both for research and for the user. To this end, we release IoT Inspector — an open-source software that you can download to inspect your home network and identify any privacy, security, and performance problems associated with your IoT devices.
What is IoT Inspector?
IoT Inspector is a Windows/Linux/Mac application that you can run on laptops, desktops, but not tablets or smartphones. By using a technique known as “ARP spoofing,” this software monitors network activities of all IoT devices connected to the home network (e.g., your “smart” appliances). It collects and shows you the following information:
- who the IoT device contacts on the Internet, and whether the contacted party is malicious or is known to track users
- how much data is exchanged (in terms of bytes per second) between the device and the contacted parties
- how often the data is exchanged
IoT Inspector collects and sends the information above to the researchers only when it is running — until the user terminates or uninstalls IoT Inspector.
Note that IoT Inspector does not collect the following information:
- network activities of phones, computers, or tablets
- actual contents of communication
- any personally identifiable information, such as your home network’s IP address, the MAC addresses of your devices, your name and email
Also note that IoT Inspector is not intended to replace existing security software packages on the your system, such as Avast, McAfee, or Windows Defender. You are still strongly recommended to engage in secure computing practices, e.g., running regular system updates, not reusing passwords, enabling firewalls, and running well-known security software.
What are the benefits of using IoT Inspector?
IoT Inspector aims to provide you with transparency into your IoT devices, e.g.,
- whether your IoT device is sharing your information with third parties;
- whether your IoT device is hacked (for instance, engaged in DDoS attacks);
- or whether your IoT device is slowing down your home network.
Aside from offering the above benefits, IoT Inspector also collects confidential data that helps us with IoT research — specifically, measuring and mitigating the security, privacy, and performance problems of IoT devices. For more information about our research, visit https://iot-inspector.princeton.edu/.
Data Privacy and Security
What data does IoT Inspector collect?
For each IoT device on your network, IoT Inspector will collect the following information and sends it to our secure server at Princeton University:
Device manufacturers, based on the first 6 characters of the MAC address of each device on your network
DNS requests and responses.
Destination IP addresses and ports contacted — but not your public-facing IP address (i.e., one that your ISP assigns to you).
Scrambled MAC addresses (i.e., with a salted hash).
Aggregate traffic statistics — i.e., number of bytes sent and received over a period of time.
Names of devices on your network. We collect this information from the following sources:
Your manual input — i.e., you can tell us what devices you have.
User Agent string — i.e., a short text (typically fewer than 100 characters) that your IoT device sends to the Internet that announces what type of device it is. This text does not typically include any personally identifiable information. For example, if you have a Samsung Smart TV, the User Agent string might look like “Mozilla/5.0 (Linux; Tizen 2.3) AppleWebKit/538.1 (KHTML, like Gecko)Version/2.3 TV Safari/538.1”.
SSDP messages — i.e., a short message (typically fewer than 100 characters) that your IoT device announces to the entire home network which includes its name. Again, this text does not typically include any personally identifiable information. For instance, if you have a Google Chromecast, it typically announces itself as “google_cast” or “Chromecast” via SSDP.
DHCP hostnames — i.e., a short text (typically fewer than 100 characters) that your IoT device announces to the entire home network which includes its name. Similarly, this text does not typically include any personally identifiable information. For example, a Wemo smart plug typically announces itself as “wemo” via DHCP.
(We collect from the sources above because some IoT devices may use none or some of the sources above for self-identification.)
TLS handshake — i.e., a short piece of data (typically fewer than 3,000 characters) that your IoT device sends to the Internet in order to establish a secure connection.
This text does not typically include any personally identifiable information.
We use this data to identify potentially vulnerable IoT devices — for instance, because they are using an outdated or insecure encryption function, in which case we notify the user of the risks of using the device.
For exact details of how we collect these data, see the source code: https://github.com/noise-lab/iot-inspector-client/blob/master/v2-src/data_upload.py. You can also download the data yourself; see this question in the FAQ.
Note that IoT Inspector will collect the traffic of all IoT devices connected to your home network while IoT Inspector is in operation. Examples of IoT devices that IoT Inspector can analyze include (but not limited to): Google Home, Amazon Echo, security cameras, smart TVs, and smart plugs. Computers, tablets, or phones will be automatically excluded. You can also manually exclude devices by either powering them down while setting up IoT Inspector, or specifying their MAC addresses.
If you do not want IoT Inspector to collect data from a particular IoT device (e.g., because it collects sensitive medical information), please disconnect it from the network now, before you start running IoT Inspector. If you are unable to disconnect it (e.g., because you need to keep the device running, or because you do not know how to disconnect it), you cannot use IoT Inspector.
How does IoT Inspector make sure it doesn’t collect sensitive information?
We make sure that all data collected is confidential.
Privacy: IoT Inspector only collects the information above. It does not collect any personally identifiable information, such as your location or IP address. As a result, we are unable to infer what IoT devices a specific person owns. We will keep the data confidential within the limits of the law.
Security: All data collected from your IoT devices is stored on a secure server at the Department of Computer Science in Princeton University. IoT Inspector transmits data to our server over a secure channel, i.e., HTTPS.
As a result of our privacy and security practices, no one has access to the collected data except us. Even so, we are unable to infer what IoT devices you own, and what you do with your devices.
(In case you’re curious, each user is identified by a unique ID, generated at random when the user first runs IoT Inspector. That’s how we distinguish between individual users.)
What are some risks of running IoT Inspector on my computer?
Performance degradation: Running IoT Inspector may reduce your network performance. If you are doing latency-sensitive activities, such as playing video games or holding video chats, we recommend that you turn off IoT Inspector. In fact, some of our users complained that IoT Inspector brought down their entire network; if this happens, stop IoT Inspector and reboot your router. Furthermore, IoT Inspector is experimental software is provided “as is;” we have not comprehensively tested IoT Inspector on all IoT devices or with all possible configurations. As a result, it may fail to work and disconnect your home devices. In this case, simply turning off IoT Inspector and rebooting your home router would likely solve the issues. If you have any critical medical devices, for instance, we suggest you exclude such devices from IoT Inspector or withdrawl from the study.
Data breach: In the unlikely event that our secure server is compromised, an attacker will have access to this form and the collected data. However, the attacker will be unable to infer what IoT devices you own (because the attacker would not know the real-world identities behind each device), and what you do with your devices.
Best-effort support: We will regularly maintain and update the software (e.g., fixing bugs) whenever necessary. In case of questions, we try our best to respond to email inquiries within 24 hours during weekdays. However, we do not guarantee long-terms support of the software. Also, we do not guarantee we will answer everyone’s questions if our capacity reaches a certain limit. In the event that IoT Inspector disrupts the normal functionality of your network, simply turn off IoT Inspector.
Despite all the effort above, can a user still accidentally send sensitive information to the researchers?
We could potentially gather three sources of sensitive information:
A user could enter their name as a part of the device’s name (e.g., “Danny’s Chromecast”). We warn users on the UI to avoid entering their names. See the screenshot below.
IoT Inspector automatically scans the network to guess likely identities of devices on the network. A part of this scan uses SSDP/mDNS, which is a way for devices to announce their identities to their network. Sometimes, a device’s own announcement may contain private information. For instance, a Chromecast may announce its name along with the video you’re streaming on YouTube.
IoT Inspector also parses DHCP Request packets that devices broadcast to the entire network — a part of the effort to identify devices. These packets may contain sensitive information, as well. For instance, an iPhone’s DHCP Request packets may say “Danny’s iPhone”.
Users can remove data gathered in 2 and 3 by clicking the “Remove” link, as shown below.
For example, my Macbook may be discovered, and my name was transmitted to Princeton:
After clicking the “Remove link”, my name dispappeared:
Why don’t you ask volunteers to run tcpdump or wireshark themsevles and have them share the pcap files?
A few reasons:
Not everyone knows how to set up a wireless network and run tcpdump or wireshark.
Even if a person knows how to do this, giving us the pcap files actually has more privacy issues than our current setup:
- The pcap file may include non-IoT devices on the same network.
- The pcap file contains more information than we need; in particular, it may contain packet payload, where, for instance, we may be able to find your password sent from your browser window (if sent over plain HTTP).
Why must IoT Inspector upload the data to Princeton?
A few reasons:
Research. We are not aware of any open-source datasets for IoT research at this point; that’s why we’re building such a dataset through the IoT Inspector project, where we collect labelled data from real IoT devices in the wild (as opposed to in the lab). This dataset would allow us, as well as other academic researchers, to understand the security and privacy issues today and fix these problems in the near future.
Crowd intelligence. To provide each user the more relevant information about their IoT devices, IoT Inspector actually analyzes the data from all users.
Here’s one example. Suppose User A’s device makes a connection to IP address
126.96.36.199, but IoT Inspector does not know the identity of this IP address (e.g., because it failed to observe the corresponding DNS packet).
If User B’s device resolves
188.8.131.52, IoT Inspector can then use this information to tell User A that User A’s device potentially contacted
How do I delete my data?
Users can delete any collected data — either per device or per account.
Delete data per device:
Delete data per account:
To delete IoT Inspector from your computer, see this question.
How do I get a copy of the collected data?
Go to the “Settings” page:
What if IoT Inspector’s server is hacked?
In the rare case where our server is hacked, an attacker could obtain a copy of the dataset and our system logs. It is true that the attacker would be able to see what device contacted what domains at what time, but beyond this information, it is unlikely that the attacker would be able to infer the real-world identities behind individual devices. This is because we do not collect any personally identifiable information such as your Internet-facing IP address (see this question); for instance, we have disabled IP logging on our webserver (https://inspector.cs.princeton.edu), so we don’t really know which IP address is running IoT Inspector.
Furthermore, even if IoT Inspector accidentally collects sensitive information, you can remove it from the user interface (see this question).
You say you don’t collect any user’s IP addresses, but I see that you’re using Leadpages and Statcounter on your main website. What’s going on?
Also, we use StatCounter to keep track of visitors to our main website (again, not the IoT Inspector tool at https://inspector.cs.princeton.edu/), so that we have some rough ideas of where the traffic is coming from (e.g., referrer and visitor locations). For our StatCounter account, we are using the free tier, which means we only have data for the latest 500 visitors. Also, we have configured StatCounter to remove the last number of each visitor’s IP address; for instance, if a visitor’s actual IP address is
184.108.40.206, the StatCounter log only shows
What do you do with the data collected?
We will release our findings in a journal/conference publication. When a consumer is unsure whether to buy a new IoT device, she can read our paper before making a decision if the device of interest is in our dataset. Otherwise, the consumer can always buy the product, analyze it with IoT Inspector, and return it if the results are unsatisfactory.
Furthermore, we will publish the results of our study in a more publicly accessible form on our center’s official blog (Freedom to Tinker, https://freedom-to-tinker.com/). This will help disseminate findings to public.
A potential benefit of our study is to provide more transparency about privacy, security and performance issues regarding IoT devices. We expect the increased transparency to encourage vendors to manufacture more private, secure and performant devices, which is a net gain for the society as whole.
How long will you keep the data? What’s your data retention policy?
Our goal is to balance reproducibility and privacy. In particular, we should retain the data long enough for any external researchers to challenge our findings, but not too long such that the data is forgotten or breached.
As such, we have decided to retain the data on our server (at the Computer Science Dept of Princeton University) for at most a year after we publish a paper on our findings in an academic journal/conference. During this one-year post-publication period, any researchers, with the approval of their respective institutional review boards, would have the opportunity to request our dataset, reproduce our results, and verify our findings.
Would you share the data with non-Princeton researchers?
Yes, but they would have to get the approval from the Institutional Review Board (IRB) from Princeton and/or their respective institutions first (which typically require the researchers to undergo IRB/ethics training).
Once the non-Princeton researchers have the approval, they will have full access to the data. Even if these researchers were to turn rogue, it is unlikely that they’d be able to infer individual user’s real-world identities; see this question.
Would you sell the data for commercial purposes?
Do you have a consent form for users?
Yes; see https://inspector.cs.princeton.edu/.
Can my housemate use IoT Inspector to spy on me?
They could. This action is against our policies as stated in the consent form, but someone could still use IoT Inspector to monitor the network without the consent of everyone on the same network.
We attempt to make it very difficult to a malicious user to monitor the network traffic of non-IoT devices. Here’s how:
We have a set of best-effort heuristics to determine if a device is non-IoT. For instance, we use Fingerbank to guess the identity of each device, based on the OUI of the MAC address as well as the destination hosts contacted. Also, we look at the DHCP Request packets and the SSDP messages to determine if a device is non-IoT.
Note: These heuristics may introduce false positives and false negatives (i.e., a non-IoT device not being marked as non-IoT, and an IoT device being marked as non-IoT).
For each non-IoT device, if a user attempts to view what communication endpoints the device has contacted (e.g., to spy on a roommate’s browsing history), the user will see the following lock screen.
The user would have to enter the first 6 characters of the MAC address to unlock the screen. This screen serves as a deterrence to users who don’t have physical access to devices to which they wish to monitor.
The effort above, again, serves as a mitigation, rather than a solution. A malicious user can, for instance, scan the network to obtain MAC address of the device they wish to spy on, thereby circumventing the checks in Step 2. Our mitigation here aims to deter non-technical users from abusing our tool, rather than highly technical users (who probably do not need IoT Inspector to spy on their housemates anyways).
Using IoT Inspector
Why do I see a question mark next to a domain name?
A domain name that ends with question mark “?” means that we are not confident in the result. One reason is that IoT Inspector has failed to observe any DNS traffic on the monitored device; this DNS traffic would otherwise help IoT Inspector identify exactly what domain name the device contacts.
As such, if your device appears to be communicating with a strange domain marked with “?”, do not panic. After all, IoT Inspector could have made a mistake here!
If you’re still not sure, talk to us and we can help you.
Why can’t I see my IoT device in the device list on the home screen?
Possible reasons (starting from the most likely reasons):
It takes some time for IoT Inspector to discover your devices. Just wait for the list to refresh itself.
Maybe your device is offline or in a sleep mode. Try to interact with your device, e.g.,
- turning it off and then on
- interacting with the associated smart phone app (if the device comes with a control app)
Your device is in the list; you just don’t recognize it. You can check if the IP and MAC addresses in the list corresponds to those of your device.
You have too many devices on your network. By default, IoT Inspector allows you to inspect up to 50 devices. If you want to inspect more devices, contact us and we can increase your limit on a case-by-case basis.
Why am I seeing an empty chart/table?
It takes about ten seconds for the chart/table to refresh. Be a little patient.
Your device is not sending/receiving any traffic. Try to interact with the device, e.g.,
- turning it off and on
- interacting with the associated smart phone app (if the device comes with a control app)
There’s a bug with IoT Inspector. Click the “Stop Inspection and Logout” button at the top of the screen, wait 30 seconds, and restart IoT Inspector.
I’ve monitored a number of devices. All of them have “No Data”. What’s going on?
Wait for a minute as IoT Inspector attemtps to intercept traffic via ARP spoofing.
If the devices still have “No Data” after a minute, it is likely that your router blocks our ARP spoofing attempts, thus preventing IoT Inspector to intercept any network traffic. In this case, IoT Inspector will not work on your network.
How do I stop IoT Inspector from collecting traffic?
You can pause the collection by clicking the “Start/Pause Inspection” button at the top of the web frontend.
To stop the collection, you can either (i) close all browser windows running IoT Inspector (in which case IoT Inspector will automatically stop collecting data within 15 seconds); or (ii) click the “Stop Inspection and Logout” button at the top of the screen.
How do I delete IoT Inspector from my computer?
To uninstall IoT Inspector:
- Stop IoT Inspector first, either by clicking the “Stop Inspection and Logout” button, or closing all browser windows with IoT Inspector running.
- In the next 30 seconds, IoT Inspector will automatically restore your network settings.
- Delete the folder called “princeton-iot-inspector” from your home directory.
See this question if you want to delete data from our server.
Is there a way to inspect the unencrypted traffic being sent?
You cannot inspect the unencrypted traffic using IoT Inspector, because IoT Inspector does not collect packet payload (with the exception of DNS, DHCP Request, and TLS Client Hello).
Still, if you really want to see what’s being sent by your devices, feel free to run tcpdump or wireshark on the same computer to collect more additional details.
IoT Inspector shows me that my devices are really unsafe. What should I do?
For the current implementation, IoT Inspector can only help you identify potentially unsafe devices; it doesn’t tell you what to do. Here are a few ways to mitigate the problem:
- Unplug the problematic device. Don’t use it. And tell us about it.
- If you still want to use it, put it in separate network.
- You can buy another wireless router dedicated for problematic IoT devices.
- You can put problematic IoT devices in a separate VLAN. A user of ours suggested following these instructions.
For academic researchers
Can I access the dataset?
A number of submissions (both by us and our existing collaborators) that use the dataset are under peer-review. We are likely to release the full dataset by the end of 2019. In the meantime, we are open to collaborating with other researchers.
To start collaboration, we suggest the following steps:
Take a look at the sample data and our data collection method. Decide whether this data is sufficient for your purpose; if not, we are open to helping you collect more data (e.g., by modifying the code) in a way that preserves our users’ privacy. Note that there are other public datasets for IoT traffic, such as the dataset from Georgia Tech or from Northeastern University.
Schedule a video call with the IoT Inspector team. We will explain our data collection method and describe any ongoing projects. Furthermore, we will also hear your proposed use of the data. Currently, we are working with a number of collaborators on device identification, anomaly detection, home network measurement, and identifying remote endpoints. If your proposed project happens to coincide with one of our existing projects, we would need to either collaborate or find a way to avoid submitting multiple papers on similar topics.
Once you agree to collaborate with us, you will need to first send us the IRB training certificates of everyone who will have access to this data. We will add you to our existing IRB proposal. Also, check with your institution to see if you need your own IRB. Once the IRBs of Princeton and possibly your institution approve the project proposal, we will send you the dataset in CSV format. While waiting for the data, feel free to experiment with the sample data, the Georgia Tech data, or the Northeastern data.
We will also schedule regular calls to answer any questions associated with the data or to help with any aspects of the research, such as writing the paper. We would like to get involved in your project to make sure that you understand how the data is collected and that your analysis does not misrepresent the dataset. Our involvement can range from writing a small section of the paper (e.g., data collection method), helping you with data analysis, to modifying IoT Inspector’s data collection method to suit your needs (of course, in a way that preserves our users’ privacy). We can discuss exactly how we can be involved, and we are flexible.
Can you provide sample data collected?
As of May 5, 2019, we have more than 3,000 users who submitted network traffic data from more than 30,000 devices. As IoT Inspector is still actively being used, we will have more data over time.
We are happy to make public the data collected from some of the IoT devices in our lab: https://iot-inspector.princeton.edu/sample-data/
Explanation of the columns:
sample.devices.csv: all devices captured by IoT Inspector
device_id: unique identifier of device, randomly generated by IoT Inspector
user_key: owner of device
device_ip: local IP address of device
device_oui: first 3 octets of device’s MAC address
device_name, device_type, device_vendor: user-entered labels for name (e.g., Chromecast), type (e.g., TV Stick), and vendor (e.g., Google) of device.
netdisco_name: name of device as inferred by netdisco
fb_name: name of device as inferred by FingerBank
dhcp_hostname: hostname of device extracted from DHCP Request packets sent by device
sample.dns.csv: mapping between IP addresses and domain names, based on DNS, SNI, or HTTP Host; can be used as ground truth
device_id: same as above; the device where IoT Inspector observed the IP-domain mapping
ts: time of server at which the IP-domain mapping was uploaded to server
hostname: IP address and the corresponding hostname
data_source: could be one of…
- “sni”: if the IP-domain mapping was observed in Client Hello SNI extension
- “http-host”: if the mapping was observed in HTTP Host header
- [dns-resolver-ip]: IP address of DNS resolver if the IP-domain mapping was observed in DNS requests/responses
device_port: src port that device uses to send Client Hello (i.e., when data_source == “sni”)
sample.flows.csv: all flows, bucketed into 5-second windows
device_id: same as above
ts: time of server at which the flow was uploaded to server
protocol: “tcp” or “udp”
remote_ip: IP address of the remote endpoint
remote_port: port of the remote endpoint
device_port: port of the device used in the communication
in_byte_count, out_byte_count: how many bytes coming in / going out during the five second window (based on sequence number in the case of TCP)
remote_hostname: domain name of remote endpoint based on DNS, SNI, or HTTP Host header; if there’s no DNS, SNI, or HTTP Host information, then we infer the remote_hostname based on FarSight DNS or PTR record — in which case we suffix the remote_hostname with a question mark
remote_reg_domain: registered domain part of the hostname
remote_web_xray: what company operates the remote hostname, according to WebXRay
sample.client_hello.csv: parsed Client Hello messages, sent by device
device_id: same as above
ts: time of server at which the Client Hello message was uploaded to server
device_port, remote_port: ports of device and remote endpoint
device_ip, remote_ip: local IP address of device, and the IP address of endpoint
version: max TLS/SSL version supported by client
cipher_suites: a list of cipher suites, expressed in integers, concatenated with the “+” sign
compression_methods: a list of compression_methods, expressed in integers, concatenated with the “+” sign
extension_details: base64 encoded string of the actual extensions and configurations being proposed by client
extension_uses_grease: whether GREASE is used in extension
cipher_suite_uses_grease: whether GREASE is used in any of the ciphers
extension_types: a list of extension_types, expressed in integers, concatenated with the “+” sign