Colorado University Pentesting Intrusion Dataset (CUPID)


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


Citation: Using this work? Please cite as:

Heather Lawrence, Uchenna Ezeobi, Orly Tauil, Jacob Nosal, Owen Redwood, Yanyan Zhuang, Gedare Bloom, CUPID: A labeled dataset with Pentesting for evaluation of network intrusion detection, Journal of Systems Architecture, Volume 129, 2022, 102621, ISSN 1383-7621, https://doi.org/10.1016/j.sysarc.2022.102621


@article{lawrence2022cupid,
title={CUPID: A labeled dataset with Pentesting for evaluation of network intrusion detection},
author={Lawrence, Heather and Ezeobi, Uchenna and Tauil, Orly and Nosal, Jacob and Redwood, Owen and Zhuang, Yanyan and Bloom, Gedare},
journal={Journal of Systems Architecture},
volume={129},
pages={102621},
year={2022},
publisher={Elsevier}
}


Packages

Name Description
CUPID-Baselines-CICFlowMeter.zip
CUPID-Human-CICFlowMeter.zip
CUPID-Auto-CICFlowMeter.zip
CICIDS17-CICFlowMeter.zip
CTU-13-CICFlowMeter.zip
CUPID-Auto-Labeled.csv
CUPID-Human-Labeled.csv
CUPID-Baselines-Labeled.csv Processing Notebook
Journal Paper

Automatically Generated Attacks

Packet Capture Attack Rules Why
052419_1504.pcapng Nmap 192.168.1.0/24 X[((X.sa == '10.10.10.13') & (X.pr == 1.0)) | ((X.sa == '10.10.10.13') & (X.pr == 6.0))] Nmap uses ICMP TCP SYN scan
052419_1613.pcapng Dig ds.lab X[((X.sa == '10.10.10.13') & (X.dp == 53.0))] Dig sends 2 packets via DNS
052419_1618.pcapng Dnsmap ds.lab -w /usr/share/wordlist/dnsmap.txt X[((X.da == '10.10.10.13') & (X.sp == 53.0))] All of the traffic occurs between 192.168.1.7 (DC) and 10.10.10.13.
052419_1623.pcapng Dnswalk -r -d ds.lab X[((X.da == '10.10.10.13') & (X.sp == 53.0))] All of the traffic occurs between 192.168.1.7 (DC) and 10.10.10.13
052419_1625.pcapng dnstracer -r 3 -v ds.lab X[((X.da == '10.10.10.13') | (X.sa == '10.10.10.13'))] The capture is so small that all 10.10.10.13 traffic is malicious
052419_1627.pcapng nslookup ds.lab X[(X.sa == '10.10.10.13') | (X.da == '10.10.10.13')] The capture is so small that all 10.10.10.13 traffic is malicious
052419_1629.pcapng nslookup -type=ns ds.lab X[(X.sa == '10.10.10.13') | (X.da == '10.10.10.13')] The capture is so small that all 10.10.10.13 traffic is malicious
052419_1631.pcapng nslookup -type=soa ds.lab and nslookup -query=mx ds.lab X[(X.sa == '10.10.10.13') | (X.da == '10.10.10.13')] The capture is so small that all 10.10.10.13 traffic is malicious
060319_1510.pcapng DVWA Pentesting X[((X.sa == '10.10.10.18') & (X.sp == 4444.0)) | ((X.sa == '10.10.10.18') & (X.sp == 41360.0)) | ((X.sa == '10.10.10.18') & (X.sp == 41412.0))] machine: 10.10.10.18. Callback listener used source port 4444. There are 2 POST HTTP requests (packets 12648 and 13415) where the shell is in the payload. These packets use source ports 41360 and 41412.
060419_1255.pcapng DVWA Pentesting X[(X.sa == '10.10.10.19') & (X.da == '192.168.1.11')] Pentesting machine: 10.10.10.19. All TCP/HTTP traffic to 192.168.1.11 is malicious
060419_1413.pcapng DVWA Pentesting X[(X.sa == '10.10.10.19') & (X.sp > 56934.0) & (X.da == '192.168.1.11') ] Pentesting machine: 10.10.10.19. Traffic after sliding window source port 56934 is malicious when sent from x.x.x.19 to x.x.x.11
060419_1509.pcapng DVWA Pentesting X[(X.sa == '10.10.10.19') & (X.sp >= 57116.0) & (X.da == '192.168.1.11') ] Pentesting machine: 10.10.10.19. Traffic after sliding window source port 57116 is malicious when sent from x.x.x.19 to x.x.x.11
060519_1319.pcapng DVWA Pentesting X[(X.sa == '10.10.10.19') & (X.da == '192.168.1.11') ] Aside from DNS, traffic from 10.10.10.19 is malicious to x.x.x.11. TCP traffic is included due to password brute forcing
060519_1355.pcapng DVWA Pentesting X[(X.sa == '10.10.10.19') & (X.da == '192.168.1.11') ] Aside from DNS, traffic from 10.10.10.19 is malicious to x.x.x.11. Assumed ICMP is also malicious
060519_1420.pcapng DVWA Pentesting X[(X.sa == '10.10.10.19') & (X.da == '192.168.1.11') ] Aside from DNS, traffic from 10.10.10.19 is malicious to x.x.x.11.
060519_1434.pcapng DVWA Pentesting X[(X.sa == '10.10.10.19') & (X.da == '192.168.1.11') ] Aside from DNS, traffic from 10.10.10.19 is malicious to x.x.x.11.
071219_1331.pcapng Bonesi -i TCP -i 50k-bots -d eth0 192.168.1.117:80 ip.src==192.168.1.117 All ICMP from 117
071219_1342.pcapng Kickthemout tool Sudo python3 kickthemout.py -target 192.168.1.10 arp.src.hw_mac==00:50:b6:21:5b:d0
071219_1352.pcapng Kickthemout tool Sudo python3 kickthemout.py -t 192.168.1.5 -p 30 arp.src.hw_mac==00:50:b6:21:5b:d0

Baselines

Baselines were taken over a period of 24 hours each. Each .pcap was split using editcap:

  • Example: editcap -c 100000 042219_1000.pcapng 042219_1000_0.pcap

  • and can be reassembled using mergecap:
  • Example: mergecap -w 042219_1000.pcapng 042219_1000_0.pcapng 042219_1000_1.pcapng 042219_1000_2.pcapng 042219_1000_3.pcapng 042219_1000_4.pcapng 042219_1000_5.pcapng 042219_1000_6.pcapng 042219_1000_7.pcapng

  • Packet Capture Rules Why
  • 042219_1000_0.pcapng
  • 042219_1000_1.pcapng
  • 042219_1000_2.pcapng
  • 042219_1000_3.pcapng
  • 042219_1000_4.pcapng
  • 042219_1000_5.pcapng
  • 042219_1000_6.pcapng
  • 042219_1000_7.pcapng
  • df['Label'] = 0 This is a benign baseline sample
  • 042319_1000_0.pcapng
  • 042319_1000_1.pcapng
  • 042319_1000_2.pcapng
  • 042319_1000_3.pcapng
  • 042319_1000_4.pcapng
  • 042319_1000_5.pcapng
  • 042319_1000_6.pcapng
  • 042319_1000_7.pcapng
  • df['Label'] = 0 This is a benign baseline sample
  • 042419_1000_0.pcapng
  • 042419_1000_1.pcapng
  • 042419_1000_2.pcapng
  • 042419_1000_3.pcapng
  • 042419_1000_4.pcapng
  • 042419_1000_5.pcapng
  • 042419_1000_6.pcapng
  • 042419_1000_7.pcapng
  • df['Label'] = 0 This is a benign baseline sample
  • 102519_0.pcapng
  • 102519_1.pcapng
  • 102519_2.pcapng
  • 102519_3.pcapng
  • 102519_4.pcapng
  • 102519_5.pcapng
  • 102519_6.pcapng
  • 102519_7.pcapng
  • 102519_8.pcapng
  • df['Label'] = 0 This is a benign baseline sample

    Human-generated Attack Data

    Packet Capture Attack Rules Why
    P1_dvwa_101619.pcapng Manual DVWA Pentesting df[df['Src IP'].str.match('10.10.10.1') | df['Src IP'].str.match('10.10.10.2') ] Kali instances originated from 10.10.10.1 or 10.10.10.2
    P2_dvwa_102519.pcapng Manual DVWA Pentesting df[df['Src IP'].str.match('10.10.10.1') | df['Src IP'].str.match('10.10.10.2') ] Kali instances originated from 10.10.10.1 or 10.10.10.2
    P3_dvwa_101619.pcapng Manual DVWA Pentesting df[df['Src IP'].str.match('10.10.10.1') | df['Src IP'].str.match('10.10.10.2') ] Kali instances originated from 10.10.10.1 or 10.10.10.2
    P4_dvwa_101719.pcapng Manual DVWA Pentesting df[df['Src IP'].str.match('10.10.10.1') | df['Src IP'].str.match('10.10.10.2') ] Kali instances originated from 10.10.10.1 or 10.10.10.2
    P5_dvwa_101719.pcapng Manual DVWA Pentesting df[df['Src IP'].str.match('10.10.10.1') | df['Src IP'].str.match('10.10.10.2') ] Kali instances originated from 10.10.10.1 or 10.10.10.2
    P6_dvwa_101819.pcapng Manual DVWA Pentesting df[df['Src IP'].str.match('10.10.10.1') | df['Src IP'].str.match('10.10.10.2') ] Kali instances originated from 10.10.10.1 or 10.10.10.2
    P7_dvwa_101819.pcapng Manual DVWA Pentesting df[df['Src IP'].str.match('10.10.10.1') | df['Src IP'].str.match('10.10.10.2') ] Kali instances originated from 10.10.10.1 or 10.10.10.2
    P8_dvwa_101819.pcapng Manual DVWA Pentesting df[df['Src IP'].str.match('10.10.10.1') | df['Src IP'].str.match('10.10.10.2') ] Kali instances originated from 10.10.10.1 or 10.10.10.2
    P9_dvwa_101819.pcapng Manual DVWA Pentesting df[df['Src IP'].str.match('10.10.10.1') | df['Src IP'].str.match('10.10.10.2') ] Kali instances originated from 10.10.10.1 or 10.10.10.2
    P10_dvwa_102319.pcapng Manual DVWA Pentesting df[df['Src IP'].str.match('10.10.10.1') | df['Src IP'].str.match('10.10.10.2') ] Kali instances originated from 10.10.10.1 or 10.10.10.2

    Human-generated Benign Data

    Packet Capture Attack Rules Why
    P1_surfing_101619.pcapng None df['Label'] = 0 This is human-generated benign traffic
    P2_surfing_101619.pcapng None df['Label'] = 0 This is human-generated benign traffic
    P3_surfing_101619.pcapng None df['Label'] = 0 This is human-generated benign traffic
    P4_surfing_101719.pcapng None df['Label'] = 0 This is human-generated benign traffic
    P5_surfing_101719.pcapng None df['Label'] = 0 This is human-generated benign traffic
    P6_surfing_101819.pcapng None df['Label'] = 0 This is human-generated benign traffic
    P7_surfing_101819.pcapng None df['Label'] = 0 This is human-generated benign traffic
    P8_surfing_101819.pcapng None df['Label'] = 0 This is human-generated benign traffic
    P9_surfing_101819.pcapng None df['Label'] = 0 This is human-generated benign traffic
    P10_surfing_102319.pcapng None df['Label'] = 0 This is human-generated benign traffic