Datasets:
The full dataset viewer is not available (click to read why). Only showing a preview of the rows.
Error code: UnexpectedError
Need help to make the dataset viewer work? Open a discussion for direct support.
url
string
| redirects
int64
| not_indexed_by_google
int64
| issuer
string
| certificate_age
int64
| email_submission
int64
| request_url_percentage
float64
| url_anchor_percentage
float64
| meta_percentage
float64
| script_percentage
float64
| link_percentage
float64
| mouseover_changes
int64
| right_click_disabled
int64
| popup_window_has_text_field
int64
| use_iframe
int64
| has_suspicious_port
int64
| external_favicons
int64
| TTL
int64
| ip_address_count
int64
| TXT_record
int64
| check_sfh
float64
| count_domain_occurrences
int64
| domain_registration_length
int64
| abnormal_url
int64
| age_of_domain
int64
| is_malicious
float64
| page_rank_decimal
float64
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
"http://www.niedziela.pl/artykul/39133/eksperci-apeluja-do-polskich-wladz-zlobki" | 1 | 0 | null | 0 | 0 | 0 | 0.592179 | 0 | 0.424242 | 0.575758 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 1 | 0 | 0 | 189 | 0 | 0 | -1 | 0 | 5.07 |
"http://www.exquisitedesires.com/~xerge/6e3bfba368e4e411f0ea467231c8567a/index.php" | 1 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.5 | 0.5 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 2 | 0 | 0.5 | 0 | 0 | 0 | -1 | 1 | 2.77 |
"http://afex.biz/gmail_verificar/ServiceLoginAuth/fwd/" | 0 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.666667 | 0.333333 | 0 | 0 | 0 | 0 | 1 | 1 | 30 | 2 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | null |
"https://asmbs.org/chapters/virginia" | 0 | 0 | "US" | -88 | 0 | 0 | 0 | 0 | 0.35 | 0.65 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 1 | 0 | 0 | 221 | 0 | 0 | -1 | 0 | 5.14 |
"http://www.whathifi.com/canton/dm100/review" | 1 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.333333 | 0.666667 | 0 | 0 | 0 | 0 | 1 | 2 | 5 | 1 | 0 | 0 | 80 | 0 | 0 | -1 | 0 | 5.47 |
"https://www.tdsb.on.ca/portals/_default/upcoming_event/guest%20speaker%20-tess_paye_febuary%202019.pdf" | 0 | 0 | "US" | -145 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 1 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 5.25 |
"http://www.the-linde-group.com/de/corporate_responsibility/employees_and_society/competing_for_talent/index.html" | 1 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.294118 | 0.705882 | 0 | 0 | 0 | 1 | 1 | 1 | 20 | 2 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 4.63 |
"https://homesmart.com/real-estate-agent/california/palmdesert/44759-thomas-tucker/smart_partners" | 0 | 0 | "US" | -316 | 0 | 0 | 0.416667 | 0 | 0.622222 | 0.377778 | 0 | 0 | 0 | 1 | 1 | 1 | 30 | 3 | 0 | 0 | 15 | 0 | 0 | -1 | 0 | 5.36 |
"https://www.books-sanseido.co.jp/events/538935/%e6%a3%ae%e8%a6%8b%e3%81%95%e3%82%93%e8%bf%91%e5%bd%b1" | 0 | 0 | "JP" | -266 | 0 | 0 | 0 | 0 | 0.583333 | 0.416667 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 1 | 0 | 0.5 | 37 | 0 | 0 | -1 | 0 | 4.46 |
"http://tinyurl.com/zkxg4le" | 1 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.25 | 0.75 | 0 | 0 | 0 | 1 | 1 | 0 | 30 | 3 | 0 | 0 | 4 | 0 | 0 | -1 | 1 | 8.17 |
"https://www.finalsite.com/design/portfolio/~board/portfolio-2018/post/american-school-of-bombay" | 1 | 0 | "US" | -284 | 0 | 0 | 0 | 0 | 0.615385 | 0.384615 | 0 | 0 | 0 | 1 | 1 | 1 | 30 | 5 | 0 | 0 | 7 | 0 | 0 | -1 | 0 | 4.94 |
"https://www.nationalenquirer.com/photos/lisa-marie-presley-broke-scandal/" | 0 | 0 | "US" | -318 | 0 | 0 | 0.25 | 0 | 0.266667 | 0.733333 | 0 | 0 | 0 | 1 | 1 | 5 | 30 | 4 | 0 | 0.5 | 141 | 0 | 0 | -1 | 0 | 5.03 |
"https://themes4wp.com/contact/" | 0 | 0 | "US" | -32 | 0 | 0 | 0.976744 | 0 | 0.361111 | 0.638889 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 2 | 0 | 0 | 127 | 0 | 0 | -1 | 0 | 4.91 |
"http://www.naphill.org/product-category/napoleon-hill-classics/?product_orderby=date" | 0 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.689922 | 0.310078 | 0 | 0 | 0 | 1 | 1 | 0 | 30 | 1 | 0 | 0 | 390 | 0 | 0 | -1 | 0 | 5.1 |
"http://www.dominios.com.co/buscar/?tld=.brujasdecartagena.com.co&domain=www" | 1 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.681818 | 0.318182 | 0 | 0 | 0 | 1 | 1 | 1 | 30 | 2 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 3.4 |
"http://ch.ai/2010/02/06/cheeky-yorkshire-tea-commercial-from-the-uk/" | 0 | 0 | null | 0 | 0 | 0 | 0.068182 | 0 | 0.307692 | 0.692308 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 1 | 0 | 0 | 134 | 0 | 0 | -1 | 0 | 2.89 |
"https://www.cweonline.org/about-cwe/cwe-eastern-massachusetts/grow" | 1 | 0 | "US" | -83 | 1 | 0 | 0 | 0 | 0.8 | 0.2 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 1 | 0 | 0.5 | 155 | 0 | 0 | -1 | 0 | 4.78 |
"https://www.oswego.edu/atmospheric-geological-sciences/opportunities" | 0 | 0 | "BE" | -390 | 0 | 0 | 0 | 0 | 0.434783 | 0.565217 | 0 | 0 | 0 | 1 | 1 | 1 | 30 | 1 | 0 | 0 | 138 | 0 | 0 | -1 | 0 | 5.23 |
"https://wild-about-travel.com/oceania/" | 1 | 0 | "US" | -60 | 0 | 0 | 0 | 0 | 0.333333 | 0.666667 | 0 | 0 | 0 | 1 | 1 | 0 | 30 | 2 | 0 | 1 | 224 | 0 | 0 | -1 | 0 | 4.3 |
"https://www.leylobby.gob.cl/instituciones/mu239/cargos-pasivos/133528/donativos" | 0 | 0 | "US" | -163 | 0 | 0 | 0.066667 | 0 | 0.666667 | 0.333333 | 0 | 0 | 0 | 0 | 1 | 0 | 29 | 3 | 0 | 0 | 29 | 0 | 0 | -1 | 0 | 3.99 |
"http://livebuzz.co.uk/visa/[email protected]?done=1" | 3 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.472222 | 0.527778 | 0 | 0 | 0 | 0 | 1 | 5 | 30 | 2 | 0 | 0 | 7 | 0 | 0 | -1 | 1 | 3.21 |
"https://www.youngandreckless.com/products/veronique-tie-back-bikini-top" | 1 | 0 | "US" | -47 | 0 | 0 | 0 | 0 | 0.625 | 0.375 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 1 | 0 | 0.5 | 339 | 0 | 0 | -1 | 0 | 4.5 |
"https://www.webhostingpad.com/awards/" | 0 | 0 | "GB" | -343 | 0 | 0 | 0 | 0 | 0.5 | 0.5 | 0 | 0 | 0 | 0 | 0 | 1 | 30 | 1 | 0 | 0 | 1 | 0 | 0 | -1 | 0 | 5.05 |
"https://www.glomarr.com/news/happy-april" | 0 | 0 | "US" | -66 | 0 | 0 | 0 | 0 | 0.411765 | 0.588235 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 1 | 0 | 0 | 13 | 0 | 0 | -1 | 0 | 3.63 |
"https://www.ericmmartin.com/donate/" | 0 | 0 | "US" | -85 | 0 | 0 | 0 | 0 | 0.333333 | 0.666667 | 0 | 0 | 0 | 0 | 1 | 1 | 30 | 1 | 0 | 0.5 | 77 | 0 | 0 | -1 | 0 | 5.28 |
"https://www.gotokyo.org/en/spot/856/" | 0 | 0 | "BE" | -176 | 0 | 0 | 0.16092 | 0 | 0.428571 | 0.571429 | 0 | 0 | 0 | 1 | 0 | 1 | 30 | 1 | 0 | 0 | 10 | 0 | 0 | -1 | 0 | 5.33 |
"http://www.librairiedialogues.fr/livre/4098802-voir-les-champignons-spooner-brian-flammarion?affiliate=d_flammarion" | 2 | 0 | null | 0 | 0 | 0 | 0.232143 | 0 | 0.6 | 0.4 | 0 | 0 | 0 | 1 | 0 | 0 | 29 | 1 | 0 | 0.5 | 10 | 0 | 0 | -1 | 0 | 4.61 |
"http://www.chinaqw.com/hqhr/2019/02-08/214934.shtml" | 0 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.75 | 0.25 | 0 | 0 | 0 | 0 | 1 | 0 | 29 | 1 | 0 | 0 | 36 | 0 | 0 | -1 | 0 | 5.07 |
"http://www.calit2.net/people/detail.php?id=276" | 0 | 0 | null | 0 | 0 | 0 | 0.058824 | 0 | 0.5 | 0.5 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 1 | 0 | 0 | 6 | 0 | 0 | -1 | 0 | 5.04 |
"http://aleberth.addr.com/x2yZ1scde/webscr_prim.php?YWxlYmVydGguYWRkci5jb20=uhsdsusu5485757kUJHNN546221oPLKj988777AOP784MTM0Njg1MzUyNQ=" | 0 | 0 | null | 0 | 0 | 0 | 0.916667 | 0 | 0.75 | 0.25 | 0 | 0 | 0 | 0 | 1 | 1 | 30 | 1 | 0 | 0 | 12 | 0 | 0 | -1 | 1 | null |
"https://www.seetickets.us/event/the-sonics-w-sailor-poon-and-the-hickoids/366431" | 3 | 0 | "US" | -294 | 0 | 0 | 0.827586 | 0 | 0.782609 | 0.217391 | 0 | 0 | 0 | 1 | 1 | 1 | 28 | 3 | 0 | 0 | 7 | 0 | 0 | -1 | 0 | 5.12 |
"http://www.play-well.org/about-lego-birthday-parties.shtml" | 1 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.764706 | 0.235294 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 1 | 0 | 0 | 8 | 0 | 0 | -1 | 0 | 4.21 |
"https://mises.org/es/search/site/author/node%3a1194/library/interviews-367" | 1 | 0 | "US" | -40 | 0 | 0 | 0 | 0 | 0.357143 | 0.642857 | 0 | 0 | 0 | 1 | 1 | 0 | 30 | 1 | 0 | 0.5 | 13 | 0 | 0 | -1 | 0 | 5.72 |
"https://www.take-a-screenshot.org/en/about.html" | 0 | 0 | "US" | -51 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 1 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 4.88 |
"http://thornbridgebrewery.com/pp/[email protected]/it/webapps/mpp/home#" | 2 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.240506 | 0.759494 | 0 | 0 | 0 | 1 | 1 | 1 | 30 | 1 | 0 | 0.5 | 0 | 0 | 0 | -1 | 1 | 4.35 |
"https://performancein.com/news/2014/11/06/how-retailers-can-best-use-data-preperation-cyber-monday/" | 0 | 0 | "US" | -58 | 0 | 0 | 0.213675 | 0 | 0.4 | 0.6 | 0 | 0 | 0 | 1 | 1 | 3 | 30 | 1 | 0 | 0 | 219 | 0 | 0 | -1 | 0 | 5.01 |
"https://www.theawl.com/2013/12/topless-geraldo/" | 0 | 0 | "US" | -71 | 0 | 0 | 0.370968 | 0 | 0.25 | 0.75 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 1 | 0 | 0.5 | 60 | 0 | 0 | -1 | 0 | 5.57 |
"https://www.skeptic.com/eskeptic/07-08-22/" | 0 | 0 | "US" | -62 | 0 | 0 | 0.263566 | 0 | 0.263158 | 0.736842 | 0 | 0 | 0 | 0 | 1 | 2 | 30 | 2 | 0 | 0 | 71 | 0 | 0 | -1 | 0 | 5.33 |
"https://www.intelliprice.com/intellipricedealer/start.htm?dealerid=1141003&dealerpacode=09801&secondaryleadsource=elite%20website&secondaryid=desktop&primaryleadsource=dc%20trade-in&vendorbrand=ford" | 0 | 0 | "US" | -116 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 30 | 4 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 4.04 |
"http://pandawhale.com/post/8014/the-very-best-grumpy-cat-gifs" | 0 | 0 | null | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 29 | 1 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 4.8 |
"http://consolidatedfuneralservices.com/" | 3 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.333333 | 0.666667 | 0 | 0 | 0 | 1 | 1 | 1 | 30 | 2 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 3.25 |
"https://www.fyber.com/announcements/fyberfalkpressrelease.pdf" | 1 | 0 | "US" | -286 | 0 | 0 | 0.24359 | 0 | 0.263158 | 0.736842 | 0 | 0 | 0 | 1 | 1 | 3 | 29 | 2 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 5.25 |
"https://bit.parts/entry/categories/blog?start=15" | 0 | 0 | "US" | -50 | 0 | 0 | 0 | 0 | 0.255319 | 0.744681 | 0 | 0 | 0 | 1 | 0 | 5 | 29 | 1 | 0 | 0 | 369 | 0 | 0 | -1 | 0 | 3.98 |
"http://www.alltop.com/science" | 3 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.5 | 0.5 | 0 | 0 | 0 | 1 | 1 | 2 | 29 | 2 | 0 | 0 | 3 | 0 | 0 | -1 | 0 | 6.6 |
"https://www.lexblog.com/2018/10/11/cma-study-into-statutory-audits/" | 0 | 0 | "US" | -283 | 0 | 0 | 0 | 0 | 0.454545 | 0.545455 | 0 | 0 | 0 | 1 | 1 | 0 | 28 | 3 | 0 | 0 | 129 | 0 | 0 | -1 | 0 | 4.88 |
"https://www.mst.edu/~matadvan" | 3 | 0 | "US" | -198 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 1 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 5.19 |
"https://www.tidyverse.org/articles/2018/01/tibble-1-4-2/" | 1 | 0 | "US" | -34 | 0 | 0 | 0 | 0.066667 | 0.333333 | 0.6 | 0 | 0 | 0 | 0 | 1 | 4 | 19 | 2 | 0 | 0 | 3 | 0 | 0 | -1 | 0 | 5.19 |
"https://www.gerritcodereview.com/" | 0 | 0 | "US" | -90 | 0 | 0 | 0 | 0 | 0.4375 | 0.5625 | 0 | 0 | 0 | 0 | 1 | 1 | 30 | 2 | 0 | 0 | 5 | 0 | 0 | -1 | 0 | 5 |
"https://eventregist.com/p/mashingup?lang=th_th" | 0 | 0 | "US" | -200 | 0 | 0 | 0 | 0 | 0.608696 | 0.391304 | 0 | 0 | 0 | 0 | 1 | 1 | 30 | 4 | 0 | 0.5 | 16 | 0 | 0 | -1 | 0 | 6.31 |
"https://petergreenberg.com/2017/08/07/luxe-lavs-new-hotels/" | 0 | 0 | "US" | -307 | 0 | 0 | 0.159664 | 0 | 0.479167 | 0.520833 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 1 | 0 | 0 | 499 | 0 | 0 | -1 | 0 | 5.07 |
"http://bradfello.ws/wp-admin/includes/www.alibaba.com/login.jsp.htm" | 0 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 1 | 0 | 0 | 3 | 0 | 0 | -1 | 1 | null |
"http://lbpol.postedi.com/index.php?MfcISAPICommand=SignInFPP&" | 1 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.5 | 0.5 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 2 | 0 | 0.5 | 0 | 0 | 0 | -1 | 1 | null |
"http://minecraftm.com/tag/minecraft-appdata/" | 0 | 0 | null | 0 | 0 | 0 | 0.083333 | 0 | 0.32 | 0.68 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 1 | 0 | 0 | 44 | 0 | 0 | -1 | 0 | 5 |
"https://www.nrcdv.org/rhydvtoolkit/common-ground/" | 0 | 0 | "US" | -65 | 0 | 0 | 0.142857 | 0 | 0.333333 | 0.666667 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 1 | 0 | 0 | 15 | 0 | 0 | -1 | 0 | 5.09 |
"https://www.goodgallery.com/faq-items/canonical-tags/" | 0 | 0 | "GB" | -105 | 0 | 1 | 0 | 0 | 0.739726 | 0.260274 | 0 | 0 | 0 | 0 | 1 | 4 | 30 | 2 | 0 | 0 | 3 | 0 | 0 | -1 | 0 | 4.24 |
"http://www3.vwa.nl/foto/ambrosia_foto_nvwa_nummer_2.jpg" | 0 | 0 | null | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 1 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 3.12 |
"https://www.rushessay.com/our_process.php" | 0 | 0 | "US" | -11 | 0 | 0 | 0 | 0 | 0.461538 | 0.538462 | 0 | 0 | 0 | 1 | 1 | 1 | 30 | 1 | 0 | 0.5 | 2 | 0 | 0 | -1 | 0 | 4.7 |
"https://www.exchangewire.com/blog/category/header-bidding/page/2/" | 0 | 0 | "GB" | -36 | 0 | 0 | 0 | 0 | 0.208333 | 0.791667 | 0 | 0 | 0 | 1 | 0 | 0 | 30 | 1 | 0 | 0 | 162 | 0 | 0 | -1 | 0 | 5.23 |
"http://www.ocert.org/patches/exslt_crypt.patch" | 1 | 0 | null | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 4 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 4.94 |
"https://metromode.se/sitemap-pt-horoskop-2017-05.html" | 0 | 0 | "US" | -297 | 0 | 0 | 0.846154 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 3 | 0 | 0 | 8 | 0 | 0 | -1 | 0 | 5.25 |
"http://www.saeima.lv/lv/likumdosana/saeimas-sedes" | 1 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.526316 | 0.473684 | 0 | 0 | 0 | 0 | 0 | 2 | 27 | 1 | 0 | 0.5 | 7 | 0 | 0 | -1 | 0 | 5.24 |
"https://www.apec.org/publications/2012/08/marine-microorganisms-capacity-building-for-a-broader-cooperative-research-and-utilization" | 0 | 0 | "US" | -309 | 0 | 0 | 0 | 0 | 0.533333 | 0.466667 | 0 | 0 | 0 | 1 | 1 | 2 | 4 | 3 | 0 | 0.5 | 3 | 0 | 0 | -1 | 0 | 5.59 |
"http://www.xuetangx.com/courses/course-v1:tsinghuax+00690212x-2+2017_t2/about" | 2 | 0 | null | 0 | 0 | 1 | 0 | 0 | 0.333333 | 0.666667 | 0 | 0 | 0 | 0 | 1 | 2 | 8 | 3 | 0 | 0 | 2 | 0 | 0 | -1 | 0 | 4.63 |
"https://relevantmagazine.com/god/church/what-christians-get-wrong-about-easter-story" | 3 | 0 | "US" | -74 | 0 | 0 | 0.08427 | 0 | 0.4375 | 0.5625 | 0 | 0 | 0 | 1 | 1 | 0 | 28 | 2 | 0 | 0 | 806 | 0 | 0 | -1 | 0 | 5.34 |
"http://www.thecassisbistro.ca/" | 2 | 0 | null | 0 | 0 | 1 | 0 | 0 | 0.428571 | 0.571429 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 2 | 0 | 0.5 | 0 | 0 | 0 | -1 | 0 | 4.14 |
"https://wannwowie.de/" | 2 | 0 | "GB" | -287 | 0 | 0 | 0 | 0 | 0.25 | 0.75 | 0 | 0 | 0 | 0 | 1 | 3 | 26 | 1 | 0 | 0.5 | 0 | 0 | 0 | -1 | 1 | 0 |
"https://x3dom.org/docs-old/genindex.html" | 0 | 0 | "US" | -38 | 0 | 0 | 0.018182 | 0 | 0.5 | 0.5 | 0 | 0 | 0 | 0 | 0 | 0 | 30 | 1 | 0 | 0 | 1 | 0 | 0 | -1 | 0 | 5.02 |
"http://198.57.247.160/~karim12/service/costumer/information/check/93cf21592dd57cdbc97d3906fc8da94a/index/web/4190f2b8546f3f74e6c3636e349ba086/login.php" | 1 | 0 | null | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 1 | 0 | 0 | 0 | 0 | 1 | -1 | 1 | null |
"http://mark3d.com/yale/login.htm" | 3 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.681159 | 0.318841 | 0 | 0 | 0 | 0 | 1 | 3 | 30 | 1 | 0 | 0.5 | 593 | 0 | 0 | -1 | 1 | 4.23 |
"http://www.haiwainet.cn/n/2019/0121/c3543950-31484352.html" | 0 | 0 | null | 0 | 0 | 0 | 0.939024 | 0 | 0.7 | 0.3 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 10 | 0 | 0 | 15 | 0 | 0 | -1 | 0 | 4.04 |
"https://www.webland.ch/de-ch/hosting/optionen" | 1 | 0 | "GB" | -306 | 0 | 0 | 0 | 0 | 0.470588 | 0.529412 | 0 | 0 | 0 | 1 | 0 | 4 | 28 | 1 | 0 | 1 | 0 | 0 | 0 | -1 | 0 | 3.57 |
"https://thetrustproject.org/trust-project-receives-funding-to-develop-trust-in-media/" | 1 | 0 | "US" | -343 | 0 | 0 | 0 | 0 | 0.366667 | 0.633333 | 0 | 0 | 0 | 1 | 0 | 0 | 29 | 1 | 0 | 0 | 126 | 0 | 0 | -1 | 0 | 5.15 |
"http://creativecommons.org.au/blog/2009/01/opinionated-volunteers-wanted/" | 1 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.183333 | 0.816667 | 0 | 0 | 0 | 0 | 1 | 6 | 30 | 2 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 5.37 |
"http://herpaderpus.0fees.net/default.php?a=login" | 0 | 0 | null | 0 | 0 | 1 | 0 | 0 | 0.333333 | 0.666667 | 0 | 0 | 0 | 0 | 1 | 1 | 30 | 1 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | null |
"https://www.back40design.com/bigcommerce" | 1 | 0 | "US" | -66 | 0 | 0 | 0 | 0 | 0.277778 | 0.722222 | 0 | 0 | 0 | 1 | 1 | 0 | 30 | 1 | 0 | 0 | 117 | 0 | 0 | -1 | 0 | 4.07 |
"https://www.blueletterbible.org/esv/eph/3/11/s_1100011" | 0 | 0 | "US" | -38 | 0 | 0 | 0 | 0 | 0.557377 | 0.442623 | 0 | 0 | 0 | 0 | 0 | 5 | 30 | 1 | 0 | 0.5 | 5 | 0 | 0 | -1 | 0 | 5.64 |
"https://www.sharp.com/health-classes/bls-for-health-care-providers-class-or-renewal-6/section-30949" | 1 | 0 | "US" | -242 | 0 | 0 | 0 | 0 | 0.5 | 0.5 | 0 | 0 | 0 | 1 | 1 | 1 | 30 | 1 | 0 | 0 | 20 | 0 | 0 | -1 | 0 | 5.11 |
"https://www.relx.com/site-services/terms-and-conditions" | 0 | 0 | "BE" | -337 | 1 | 0 | 0 | 0 | 0.25 | 0.75 | 0 | 0 | 0 | 1 | 1 | 1 | 20 | 2 | 0 | 0.5 | 8 | 0 | 0 | -1 | 0 | 5.24 |
"https://www.vtinfo.com/pf/product_finder.asp?custid=gre" | 1 | 0 | "US" | -333 | 0 | 0 | 0.5 | 0 | 0.6 | 0.4 | 0 | 0 | 0 | 0 | 1 | 1 | 19 | 2 | 0 | 0 | 1 | 0 | 0 | -1 | 0 | 3.52 |
"http://www.artlebedev.com/stoloto/rapidoloto/" | 1 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.5 | 0.5 | 0 | 0 | 0 | 1 | 0 | 0 | 29 | 1 | 0 | 0 | 2 | 0 | 0 | -1 | 0 | 5.29 |
"https://www.m247.ro/ro/despre/echipa/mike-darcey-ro/" | 1 | 0 | "US" | -48 | 0 | 0 | 0 | 0 | 0.066667 | 0.933333 | 0 | 0 | 0 | 0 | 1 | 2 | 28 | 2 | 0 | 0 | 3 | 0 | 0 | -1 | 0 | 3.54 |
"http://game.com/?utm_source=inp.one" | 1 | 0 | null | 0 | 0 | 1 | 0 | 0 | 0.571429 | 0.428571 | 0 | 0 | 0 | 0 | 0 | 1 | 30 | 1 | 0 | 0 | 4 | 0 | 0 | -1 | 0 | 4.95 |
"https://www.blazemeter.com/script-creation" | 1 | 0 | "US" | -76 | 0 | 0 | 0 | 0 | 0.241379 | 0.758621 | 0 | 0 | 0 | 1 | 1 | 1 | 30 | 1 | 0 | 0 | 4 | 0 | 0 | -1 | 0 | 5.03 |
"https://ad.gt/buttons" | 0 | 0 | "US" | -305 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 29 | 3 | 0 | 0 | 4 | 0 | 0 | -1 | 0 | 3.97 |
"https://www.nicelabel.com/es/product-selector?uid=11287" | 3 | 0 | "US" | -106 | 0 | 0 | 0 | 0 | 0.217391 | 0.782609 | 0 | 0 | 0 | 1 | 0 | 2 | 28 | 1 | 0 | 1 | 6 | 0 | 0 | -1 | 0 | 4.29 |
"https://www.goguardian.com/newsroom.html" | 1 | 0 | "US" | -16 | 0 | 0 | 0 | 0 | 0.5 | 0.5 | 0 | 0 | 0 | 1 | 1 | 1 | 19 | 3 | 0 | 0 | 3 | 0 | 0 | -1 | 0 | 5 |
"https://www.amazon.in/printtech-pattern-huawei-enhanced-dual-sim/dp/b01m0g9vpy?subscriptionid=akiai46kfzlxd4qvnu7a" | 0 | 0 | "US" | -102 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 1 | 0 | 0 | 2 | 0 | 0 | -1 | 0 | 6.45 |
"https://flutter.io/docs/get-started/install/macos" | 2 | 0 | "US" | -44 | 0 | 0 | 0.165577 | 0 | 0.565217 | 0.434783 | 0 | 0 | 0 | 1 | 1 | 1 | 29 | 1 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 5.27 |
"https://www.spandidos-publications.com/pages/mmr/abstracting" | 0 | 0 | "US" | -292 | 0 | 0 | 0 | 0 | 0.769231 | 0.230769 | 0 | 0 | 0 | 0 | 1 | 0 | 29 | 1 | 0 | 0 | 2 | 0 | 0 | -1 | 0 | 5.31 |
"https://margauxny.com/products/the-demi-black-navy" | 0 | 0 | "US" | -255 | 0 | 0 | 0 | 0 | 0.380952 | 0.619048 | 0 | 0 | 0 | 1 | 1 | 0 | 29 | 2 | 0 | 0.5 | 420 | 0 | 0 | -1 | 0 | 4.83 |
"https://flythemes.net/forums/topic/vacation-lite-remove-comment-tag/" | 0 | 0 | "US" | -39 | 0 | 0 | 0.041667 | 0 | 0.37931 | 0.62069 | 0 | 0 | 0 | 1 | 0 | 0 | 30 | 1 | 0 | 0 | 166 | 0 | 0 | -1 | 0 | 5.29 |
"http://www.tes.co.uk/mypublicprofile.aspx?uc=932747&event=21" | 4 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.7 | 0.3 | 0 | 0 | 0 | 0 | 1 | 1 | 29 | 3 | 0 | 0.5 | 0 | 0 | 0 | -1 | 0 | 5.32 |
"http://www.journalonweb.com/meajo/forgotpass.asp" | 1 | 0 | null | 0 | 0 | 1 | 0 | 0 | 0.5 | 0.5 | 0 | 0 | 0 | 0 | 1 | 0 | 30 | 2 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 4.25 |
"http://www.geautomation.com/de/download/pacsystems-hochverf%c3%bcgbare-l%c3%b6sungen" | 4 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.636364 | 0.363636 | 0 | 0 | 0 | 1 | 1 | 1 | 29 | 4 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 4.14 |
"http://www.dpa.org.nz/news/preliminary-notice-dpa-agm-2018" | 1 | 0 | null | 0 | 0 | 0 | 0 | 0 | 0.5 | 0.5 | 0 | 0 | 0 | 0 | 0 | 0 | 25 | 1 | 0 | 0 | 7 | 0 | 0 | -1 | 0 | 4.88 |
"https://www.urbanbound.com/blog/4-questions-you-probably-have-about-relocation-tax-gross-ups" | 1 | 0 | "US" | -47 | 0 | 0 | 0.180556 | 0 | 0.75 | 0.25 | 0 | 0 | 0 | 1 | 1 | 0 | 30 | 2 | 0 | 0 | 99 | 0 | 0 | -1 | 0 | 4.7 |
"https://bit.parts/?start=40" | 0 | 0 | "US" | -50 | 0 | 0 | 0 | 0 | 0.26087 | 0.73913 | 0 | 0 | 0 | 0 | 0 | 5 | 29 | 1 | 0 | 0 | 63 | 0 | 0 | -1 | 0 | 3.98 |
"https://www.mecum.com/lots/lv0118-315122/1908-indian-single-board-track-racer/" | 1 | 0 | "US" | -82 | 0 | 0 | 0 | 0 | 0.4 | 0.6 | 0 | 0 | 0 | 0 | 1 | 3 | 30 | 2 | 0 | 0 | 66 | 0 | 0 | -1 | 0 | 5.51 |
"https://www.jcvi.org/cms/research/past-projects/cmr/overview/?page=cmr_search&search_type=cog&crumbs=searches" | 1 | 0 | "US" | -105 | 0 | 0 | 0 | 0 | 0.142857 | 0.857143 | 0 | 0 | 0 | 1 | 1 | 4 | 30 | 1 | 0 | 0 | 35 | 0 | 0 | -1 | 0 | 5.37 |
"https://www.iha.com.tr/haber-mamut-art-project-7nci-yilinda-50-yeni-sanatciyi-agirliyor-764131/" | 1 | 0 | "US" | -86 | 0 | 0 | 0 | 0 | 0.175 | 0.825 | 0 | 0 | 0 | 0 | 0 | 2 | 27 | 1 | 0 | 0 | 45 | 0 | 0 | -1 | 0 | 5.03 |
Important Notice:
- A subset of the URL dataset is from Kaggle, and the Kaggle datasets contained 10%-15% mislabelled data. See this dicussion I opened for some false positives. I have contacted Kaggle regarding their erroneous "Usability" score calculation for these unreliable datasets.
- The feature extraction methods shown here are not robust at all in 2023, and there're even silly mistakes in 3 functions:
not_indexed_by_google
,domain_registration_length
, andage_of_domain
.
The features dataset is original, and my feature extraction method is covered in feature_extraction.py.
To extract features from a website, simply passed the URL and label to collect_data()
. The features are saved to phishing_detection_dataset.csv
locally by default.
In the features dataset, there're 911,180 websites online at the time of data collection. The plots below show the regression line and correlation coefficients of 22+ features extracted and whether the URL is malicious. If we could plot the lifespan of URLs, we could see that the oldest website has been online since Nov 7th, 2008, while the most recent phishing websites appeared as late as July 10th, 2023.
Malicious URL Categories
- Defacement
- Malware
- Phishing
Data Analysis
Here are two images showing the correlation coefficient and correlation of determination between predictor values and the target value is_malicious
.
Let's exmain the correlations one by one and cross out any unreasonable or insignificant correlations.
Variable | Justification for Crossing Out |
---|---|
contracdicts previous research (as redirects increase, is_malicious tends to decrease by a little) | |
0.00 correlation | |
contracdicts previous research | |
request_url_percentage | |
issuer | |
certificate_age | |
contracdicts previous research | |
0.00 correlation | |
script_percentage | |
link_percentage | |
contracdicts previous research & 0.00 correlation | |
contracdicts previous research & 0.00 correlation | |
contracdicts previous research | |
contracdicts previous research | |
contracdicts previous research | |
contracdicts previous research | |
TTL (Time to Live) | |
ip_address_count | |
all websites had a TXT record | |
contracdicts previous research | |
count_domain_occurrences | |
domain_registration_length | |
abnormal_url | |
age_of_domain | |
page_rank_decimal |
Pre-training Ideas
For training, I split the classification task into two stages in anticipation of the limited availability of online phishing websites due to their short lifespan, as well as the possibility that research done on phishing is not up-to-date:
- a small multilingual BERT model to output the confidence level of a URL being malicious to model #2, by finetuning on 2,436,727 legitimate and malicious URLs
- (probably) LightGBM to analyze the confidence level, along with roughly 10 extracted features
This way, I can make the most out of the limited phishing websites avaliable.
Source of the URLs
- https://moz.com/top500
- https://phishtank.org/phish_search.php?valid=y&active=y&Search=Search
- https://www.kaggle.com/datasets/siddharthkumar25/malicious-and-benign-urls
- https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset
- https://github.com/ESDAUNG/PhishDataset
- https://github.com/JPCERTCC/phishurl-list
- https://github.com/Dogino/Discord-Phishing-URLs
Reference
- https://www.kaggle.com/datasets/akashkr/phishing-website-dataset
- https://www.kaggle.com/datasets/shashwatwork/web-page-phishing-detection-dataset
- https://www.kaggle.com/datasets/aman9d/phishing-data
Side notes
- Cloudflare offers an API for phishing URL scanning, with a generous global rate limit of 1200 requests every 5 minutes.
- Downloads last month
- 9