:keyboard: Wordlists, Dictionaries and Other Data Sets for Writing Software Security Test Cases

View on GitHub


|      Folder  Name      | Description of Contents |:——————–|——————————————————————————————————————————————————– | acronyms-defined-dict | dictionary of acronyms defined from technical jargon | adjectives-abuse-list | list of adjectives that RobinAbuseBot constructs insults with via https://github.com/llamasoft/RobinAbuseBot/blob/master/RobinAbuseBot.user.js
| adjective-words-list | list of various English adjective words | alfred-hitchcock-movies | all films produced by Alfred Hitchcock http://infolab.stanford.edu/pub/movies/Hitch.html | bbs-subjects-list | list of forum topics from an electronic bulletin board system | book-titles-list | Loosely formatted list of book titles | buildings-word-list | list of words related to buildings from http://domainsbot.com/Content/Data/terms/buildings.txt | business-noun-words | a list of words that can be categorized as business-related nouns | common-english-words | a brief listing of the most commonly used words in English (the top 30) | curse-words-list | Warning! list of vulgar (i.e. “curse”) words | english-connective-words | list of “connective” English words–these are words that can often be dropped from simple search queries | english-top-1000 | 1,000 most used words in the English language | english-top-1500 | 1,500 most used words in the English language taken from htpwdScan | english-words-various | various english words taken from some old ZIP files | espionage-techniques-list | an alphabetized list of espionage techniques
| etc-anonymizer-names | copy of Splunk’s etc/anonymizer/names.txt | female-names-list | alphabetized and capitalized list of female names | first-names-list | list of first names often used by people | first20hours-google-20k | 20,000 words parsed from Google https://github.com/first20hours/google-10000-english/blob/master/20k.txt | fortune-global500-list | Fortune 500 companies list with rank, company name, revenue.. http://fortune.com/global500/ | geographic-stop-words | geographical stop words, i.e. words that are ignored during natural language processing | indefinite-nouns-words | a list of indefinite nouns–in other words, not “proper” nouns and therefore do not need to be capitalized.. https://gist.githubusercontent.com/gardner/25d36eea91523d5a30d3e5197c6cc2b3/raw/a42ac049336b388674ecd1f1f37dd2f0cbd02ae7/nouns.txt
| international-address-list | various samples of worldwide postal addresses
| ieee-journal-names | a list of journals published by the IEEE | infosec-glossary-terms | glossary of information security terminology copied from RFC4949: https://tools.ietf.org/html/rfc4949 | jargon-common-words | common information technology jargon words | jargon-common-bases | common base words in information technology jargon | keyword-ideas-generator | list constructed from the words shown by keywordideasgenerator.com
| last-names-list | list of last names often used by people in the Americas | linux-words-dict | words taken from /usr/share/dict/words on Linux install | longest-english-words | alphabetized list of English words that are longer than twenty letters | mrrobot-season3-subtitles | Subtitles for season three of the Mr. Robot television series https://www.podnapisi.net/subtitles/search/mr-robot-2015/SOM?seasons=3 | multi-lingual-vernacular | Popular vernacular from common languages along with associated numeric rating of each
| not-found-translations | “Not Found” translations on iBiblio | nouns-abuse-list.txt | list of nouns that RobinAbuseBot constructs insults with via https://github.com/llamasoft/RobinAbuseBot/blob/master/RobinAbuseBot.user.js
| obama-nobel-speech | Barack Obama’s 2009 Nobel Peace Prize award acceptance speech | objects-name-list | listing that contains names of various objects from http://domainsbot.com/Content/Data/terms/objects.txt | occupations-frequency-list | list of occupations sorted by numeric score–higher means more popular from http://sunlight.s3.amazonaws.com/all_occupations.txt | one-hundred-thousand | the numbers 1-97935 with one on each line | phrack-acronyms-metalshopprivate | Phrack | reliable-passgen-wordlist | wordlist.txt from BURP | rogets-thesaurus-ebook | Roget’s Thesaurus EBook from Project Gutenberg | sdbf-count-1edit | single character edits file packaged with Smart DNS Brute Forcer
| sdbf-count-1w | individual word frequency counts packaged with Smart DNS Brute Forcer
| sdbf-count-2l | double letter sequence frequency counts packaged with Smart DNS Brute Forcer
| sdbf-count-2w | double word sequence frequency counts packaged with Smart DNS Brute Forcer
| sdbf-count-3l | triple letter sequence frequency counts packaged with Smart DNS Brute Forcer
| sdbf-count-big | big word frequency counts packaged with Smart DNS Brute Forcer
| search-stop-words | commonly used words that a search engine will be programmed to ignore | secureblackbox-client-list | brief list of corporations with household names https://www.secureblackbox.com/company/clients.aspx | security-words-dictionary | dictionary of some actual real, but mostly made-up security words created manually by yours truly
| sfbay-companies-list | List of companies based in the San Francisco Bay Area | sierrasoftworks-bender-quotes | list of Bender quotes via https://raw.githubusercontent.com/SierraSoftworks/bender/master/configs/quotes.json
| spike-proxy-allwords | dictionary distributed with ImmunitySec SPIKE Proxy dpkg | technical-manual-words | various forms of technical root words likely to be found in a manual from http://scrapmaker.com/data/wordlists/technology/TechnicalManualWords(1495).txt | tweet-word-ngrams | binary sequence and words parsed from tweets with cardinality | unicode-words-list | list of strings with some containing Unicode characters via https://wing.comp.nus.edu.sg/~forecite/services/keyphrase/lib/x/Porter/output.txt | usenet-name-strings | strings parsed from USENET names | usgovt-manual-acronyms | U.S. Government Manual Commonly Used Agency Acronyms | various-vocabulary-sorted | Various English words in lowercase and sorted | word-cluster-ngrams | word clusters parsed from the text in millions of tweets | worker-death-cases | sentences sorted by increasing length, each of which details unique cases of worker death | worker-death-summaries2017 | Occupational Health and Safety Adminstration Archive Reports of Fatalities and Catastrophes | yo-momma-jokes | over one thousand very crude “yo momma!” jokes | zoo-animal-list | an alphabetically sorted list of names for zoo-kept animals