Top Domains by Extracted Triples for Extractor html-mf-species


Back to Statistics

This page contains the list of top domains using the Microformats species of the extraction of December 2014 of the Web Data Commons project. The page shows the top domains employing Microformats species within their websites, ordered by the number of triples found in the crawl corpus.


  1. wikipedia.org (538,555 triples)
  2. blekko.com (76,624 triples)
  3. preen.com (8,184 triples)
  4. thefullwiki.org (4,967 triples)
  5. oiseaux.net (4,035 triples)
  6. blogspot.com (3,866 triples)
  7. wiktionary.org (3,610 triples)
  8. mashpedia.com (1,717 triples)
  9. schools-wikipedia.org (1,449 triples)
  10. wikidoc.org (1,255 triples)
  11. wikia.com (1,121 triples)
  12. eol.org (1,053 triples)
  13. sensagent.com (892 triples)
  14. digplanet.com (874 triples)
  15. bbc.co.uk (810 triples)
  16. birdsguides.com (295 triples)
  17. marefa.org (247 triples)
  18. wordpress.com (230 triples)
  19. doleaf.com (220 triples)
  20. tfode.com (201 triples)
  21. etceter.com (178 triples)
  22. xingyimax.com (174 triples)
  23. wikimedia.org (159 triples)
  24. theplantencyclopedia.org (158 triples)
  25. sfstate.us (108 triples)
  26. snaturou2000.sk (100 triples)
  27. tanijaya.com (98 triples)
  28. goo.ne.jp (96 triples)
  29. hidemyass.com (92 triples)
  30. antwiki.org (92 triples)
  31. esacademic.com (88 triples)
  32. mmorpg.com (88 triples)
  33. blogfa.com (85 triples)
  34. eoearth.org (80 triples)
  35. quickiwiki.com (79 triples)
  36. findaplant.co.nz (73 triples)
  37. cafemom.com (66 triples)
  38. karikuy.org (63 triples)
  39. blogspot.com.au (60 triples)
  40. pictures-of-cats.org (49 triples)
  41. xfam.org (46 triples)
  42. answers.com (44 triples)
  43. wn.com (39 triples)
  44. 7seas.ca (38 triples)
  45. encydia.com (38 triples)
  46. territorioscuola.com (37 triples)
  47. webs.com (37 triples)
  48. pigsonthewing.org.uk (35 triples)
  49. academic.ru (34 triples)
  50. rewardinthecognitiveniche.us (33 triples)
  51. skaphandrus.com (33 triples)
  52. index.hr (32 triples)
  53. care2.com (28 triples)
  54. tanagraltd.com (27 triples)
  55. inkedanimal.com (23 triples)
  56. adayout2.com (23 triples)
  57. westwoodpavillion.com (23 triples)
  58. iiwiki.com (23 triples)
  59. cuscus.cc (23 triples)
  60. oldearth.org (22 triples)
  61. jp-aquietcorner.com (22 triples)
  62. jsppharma.com (20 triples)
  63. xklsv.org (17 triples)
  64. yolasite.com (17 triples)
  65. polewali.com (17 triples)
  66. ning.com (17 triples)
  67. drchanshealinginstitute.com (17 triples)
  68. science20.com (16 triples)
  69. bettavillage.com (16 triples)
  70. travelmerida.com (15 triples)
  71. wow.com (13 triples)
  72. debate.org (11 triples)
  73. classicistranieri.com (10 triples)
  74. herokuapp.com (10 triples)
  75. veterinariosvs.org (8 triples)
  76. obathepatitis.info (7 triples)
  77. blogspot.com.ar (7 triples)
  78. mex.tl (6 triples)
  79. archive.org (5 triples)
  80. kodoom.com (5 triples)
  81. blogcu.com (4 triples)
  82. microformats.org (3 triples)
  83. wikibooks.org (2 triples)
  84. blogspot.mx (2 triples)
  85. blogspot.hu (2 triples)
  86. tatoott1009.com (2 triples)
  87. scmp.com (2 triples)
  88. fullblog.com.ar (1 triples)
  89. top-me.com (1 triples)
  90. culturaempresarialganadera.org (1 triples)
  91. blogspot.com.es (1 triples)
  92. lapunk.hu (1 triples)
  93. obolog.es (1 triples)
  94. gportal.hu (1 triples)
  95. blogspot.co.uk (1 triples)
  96. tattooinspiration.com (1 triples)