| The RCSB Protein Data Bank (PDB) |
| RCSB Protein Data Bank |
| http://www.rcsb.org/pdb/ |
| Archive of experimentally-determined, biological macromolecule 3-D structures from the Brookhaven Na tional Laboratory. |
| The PDB archive contains information about experimentally-determined structures of proteins,
nuc leic acids, and complex assemblies. As a member of the wwPDB, the RCSB PDB curates and
annotates PDB data according to agreed upon standards. The RCSB PDB also provides a variety
of tools and resources. Users can perform simple and advanced searches based on annotations
relating to seque nce, structure and function. These molecules are visualized, downloaded, and
analyzed by users w ho range from students to specialized scientists. |
| |
| structural, genomics, search, center, structure, widgets, setwidgetviewdestroy, function, homepage, consortium, proteomics, setwidgetview, structures, chemical, homepageright, protein, listmain, featu res, latest, external, molecule, deposition, download, sequence, entries, microscopy, restoreorder, services, display, initiative, electron, results, session, research, release, website, pagetracker, cursor, sortable, previous, widget, placeholder, boxheader, handle, forceplaceholdersize, advanced, update, getorder, reload, featured, proteins |
rcsb.org - rank der domain 164379 (65052 in US)
|
|
| zum Seitenanfang ↑ |
| Dataset generator |
| www.datgen.com |
| http://www.datgen.com/ |
| Datgen, formerly SCDS, is a computer program that generates data to systematically test programs tha t consume data. These synthetic datasets can be used to validate learning algorithms. |
| |
| |
| needed, browser, datgen, frames, capable |
| (SLD : datgen.com) |
|
| zum Seitenanfang ↑ |
| DELVE - Data for Evaluating Learning in Valid Experiments |
| DELVE - Data for Evaluating Learning in Valid Experiments |
| http://www.cs.utoronto.ca/~delve/ |
| Data for Evaluating Learning Valid Experiments: A standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical data. Delve makes it po ssible for users to compare their learning methods with other methods on many datasets. |
| |
| |
| methods, software, datasets, learning, archive, environment, before, utoronto, available, request, p erformance, submit, evaluating, learning, experiments, version, repository, datasets, regression, nu mber, precise, obtained, running, patches, results, complete, descriptions, implementations, serves, classification, please, suitable, dataset, summary, growing, researchers, immediately, experienced, modified, acknowledgements, development, questions, toronto, comments, members, electronic, notice, copyright, copyrighted, should, latest |
utoronto.ca - rank der domain 6027 (55 in CA)
|
|
| zum Seitenanfang ↑ |
| Computers/Artificial_Intelligence/Machine_Learning/Datasets |
|
|
| Computers/Artificial_Intelligence/Machine_Learning/Datasets |
| zum Seitenanfang ↑ |
| UCI Machine Learning Repository |
| UCI Machine Learning Repository |
| http://www.ics.uci.edu/~mlearn/MLRepository.html |
| A repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. |
| |
| |
| repository, machine, learning, policy, please, mirror, information, characters, policy, citation, co ntact, review, advertisements, internet, opinion, demospongiae, mutants, forest, opinosis, newest, s imilarity, structural, function, homology, genome, sequence, donation, evaluation, provided, potenti al, diagnostic, wisconsin, version, inflammations, characteristics, plants, 106278, 141383, cancer, popular, breast, quality, reputation, parkinsons, telemonitoring, concrete, abalone, libras, movemen t, communities, available |
uci.edu - rank der domain 7094 (2643 in US)
|
|
| zum Seitenanfang ↑ |
| TREC Data |
| Text REtrieval Conference (TREC) Data |
| http://trec.nist.gov/data.html |
| Text datasets used in information retrieval and learning in text domains. |
| |
| |
| collections, information, conference, commerce, intelligence, department, friday, updated, agency, a dvanced, research, search, projects, webspace, activity, notice, security, policy, privacy, accessib ility, statement, disclaimer, thursday, created, technology, filtering, entity, enterprise, confusio n, genomics, million, interactive, chemical, retrieval, versions, novelty, relevance, laboratory, ac cess, retrieval, sponsored, question, robust, feedback, answering, series, terabyte, division |
nist.gov - rank der domain 13552 (5032 in US)
|
|
| zum Seitenanfang ↑ |
| National Space Science Data Center |
| Welcome to the NSSDC! |
| http://nssdc.gsfc.nasa.gov/ |
| Provides access to a wide variety of astrophysics, space physics, solar physics, lunar and planetary data from NASA space flight missions, in addition to selected other data and some models and softwa re. |
| National Space Science Data Center (NSSDC) Home Page |
| NASA, NSSDC, data, space physics, astronomy, astrophysics, planetary science, planet, planets, moon, lunar science, solar wind, CD-ROM, CD-ROMs, solar physics |
| mission, science, services, exploration, heliophysics, website, spacecraft, asteroid, science, infor mation, resources, launched, management, policy, import, catalog, welcome, launch, released, archive , system, universe, astrophysics, search, center, portal, access, received, questionssearchcontact, ground, lutetia, archive, permanent, planetary, termination, observations, discipline, orbiting, des igned, energetic, neutral, october, rosetta, rocket, pegasus, official, notices, national, grayzeck, important, privacy |
nasa.gov - rank der domain 853 (371 in US)
|
|
| zum Seitenanfang ↑ |
| The StatLib Datasets Archive |
| StatLib---Datasets Archive |
| http://lib.stat.cmu.edu/datasets/ |
| A repository of datasets used in statistics and machine learning. |
| |
| |
| submitted, kbytes, analysis, contains, variables, dataset, series, archive, series, regression, stat istical, format, observations, statistics, between, number, university, therneau, information, colle ction, models, columns, collected, simonoff, stukel, mcleod, springer, analysis, values, published, statlib, source, measurements, verlag, datasets, johnson, contains, description, including, regressi on, meters, linear, submitted, bayesian, science, pollution, datasets, october, applied, chapman, de gree |
cmu.edu - rank der domain 6080 (2241 in US)
|
|
| zum Seitenanfang ↑ |
| NIST Special Database 4. |
| NIST Error Page |
| http://www.nist.gov/srd/nistsd4.htm |
| This NIST database of fingerprint images contains 2000 8- bit gray scale fingerprint image pairs. |
| |
| |
| research, information, federal, information, programs, quality, programs, neutron, organization, pub lic, questions, contact, search, science, technology, standards, including, technology, science, onl ine, department, agency, looking, search, commerce, bureau, policy, freedom, disclaimer, notice, acc essibility, statement, expectmore, performance, updated, inquiries, created, security, policy, servi ce, federal, program, inquiries, website, webmaster, privacy, technical, gaithersburg, databases, wo rking, seminars |
nist.gov - rank der domain 13552 (5032 in US)
|
|
| zum Seitenanfang ↑ |
| Computers/Artificial_Intelligence/Machine_Learning/Datasets |
|
|
| Computers/Artificial_Intelligence/Machine_Learning/Datasets |
| zum Seitenanfang ↑ |
| Face recognition dataset |
| A NEURAL NETWORK FACE RECOGNITION ASSIGNMENT |
| http://www.cs.cmu.edu/afs/cs.cmu.edu/user/avrim/www/ML94/face_homework.html |
| A dataset of face images for face recognition algorithms. |
| |
| |
| assignment, images, neural, directory, contains, recognize, networks, document, network, detail, acc ess, backpropagation, source, describes, algorithm, description, updated, specification, different, training, trainset, format, program, facial, assignment, semester, learning, machine, recognition, n etwork, neural, recognition, assignment, network, neural, carnegie, mellon, expressions, positions, individual, variety, recognition, involves, students, people |
cmu.edu - rank der domain 6080 (2241 in US)
|
|
| zum Seitenanfang ↑ |
| Time Series Data Library |
| Index of /tsdl |
| http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/ |
| A collection of over 500 time series, maintained by Rob Hyndman. Time series are organized by subjec t. |
| |
| |
| finance, korsan, roberts, ecology1, commod, lamarche, londonwq, monthly, noakes, thompsto, cnelson, pruscha, sanfran, wisconsi, prothero, directory, parent, description, modified, m2call, astatkie, ba racos, blowfly, annual, mc1001, boxjenk |
monash.edu.au - rank der domain 10868 (55 in AU)
|
|
| zum Seitenanfang ↑ |
| Penn Treebank Project |
| Penn Treebank Project |
| http://www.cis.upenn.edu/~treebank/ |
| A corpus of parsed sentences. Used by many researchers for training data-driven parsing algorithms. |
| |
| |
| treebank, project, bracketing, linguistic, annotation, switchboard, corpus, computer, access, annota tion, speech, predicate, argument, release, structure, linguistic, taylor, programmer, investigator, introduction, personnel, principal, administrator, mitchell, marcus, manager, preliminary, version, publications, computational, linguistics, before, overview, project, annotations, maintained, treeb ank, creating, formats, speech, change, language, analysis, ferguson, alyson, littman, cooper, macin tyre, annotators, constance, grammatical |
upenn.edu - rank der domain 2867 (1111 in US)
|
|
| zum Seitenanfang ↑ |
| HS3D - Homo Sapiens Splice Sites Dataset |
| HS3D - Homo Sapiens Splice Sites Dataset |
| http://www.sci.unisannio.it/docenti/rampone/ |
| HS3D (Homo Sapiens Splice Sites Dataset) is a database of Homo Sapiens Exon, Intron and Splice regio ns extracted from GenBank primate sequences Rel.123. The aim of this data set is to give standardize d material to train and to assess the prediction accuracy of computational approaches for gene ident ification and characterization. |
| |
| |
| dataset, splice, sapiens |
unisannio.it - rank der domain 292095 (5076 in IT)
|
|
| zum Seitenanfang ↑ |
| Learning Relational Concepts from Sensor Data of a Mobile Robot |
| TU Dortmund -- Computer Science VIII |
| http://www-ai.cs.uni-dortmund.de/FORSCHUNG/PROJEKTE/BLEARN2/data-sets.html |
| A set of data sets, where each data set is represented in first order logic. Maintained at the Unive rsity of Dortmund, Germany. |
| |
| |
| sensor, predicates, features, concepts, learning, actions, passes, perceptual, higher, hierarchy, te aching, anmerkungen, general, research, denotes, urchintracker, zuletzt, directed, betreff, dieser, perceptions, pdirections, period, impressum, 1825527, sclass, bottom, dxsucc, contained, sensorgroup , information, retreived, perform, further, server, example, webmaster, needed, sequence, necessary, disjoint, concepts, sensor, relational, learning, mobile, katharina, volker, klingspor, german, sub mit |
uni-dortmund.de - rank der domain 115080 (5984 in DE)
|
|
| zum Seitenanfang ↑ |
| Web->KB dataset |
| World Wide Knowledge Base (Web->KB) project |
| http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/ |
| Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categori zation and learning to extract symbolic knowledge from the World Wide Web. |
| |
| |
| learning, conference, proceedings, mccallum, information, andrew, machine, international, mitchell, classification, workshop, extraction, knowledge, slattery, freitag, project, craven, language, appea r, intelligence, artificial, national, unlabeled, mining, project, research, rennie, related, hypert ext, dipasquo, working, labeled, computational, science, knowledge, riloff, categorization, inferenc e, sebastian, documents, learning, shrinkage, bootstrapping, grammatical, techniques, mladenic, symb olic, building, general, processing, rosenfeld |
cmu.edu - rank der domain 6080 (2241 in US)
|
|
| zum Seitenanfang ↑ |
| WordSimilarity-353 Test Collection |
| The WordSimilarity-353 Test Collection |
| http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/wordsim353.html |
| Contains 353 English word pairs along with human-assigned similarity judgements. |
| Word similarity test collection |
| Word Similarity, Computational Linguistics,
Natural Language Processing, NLP,
Natural Language Understanding, Natural Language Analysis,
Information Retrieval, IR, Artificial Intelligence, AI ,
Machine Learning, Corpus Linguistics, Algorithm Design |
| similarity, semantic, scores, gabrilovich, evgeniy, subjects, wordsimilarity, technion, combined, wo rdsim353, contains, wordnet, latent, collection, please, finkelstein, available, language, abstract, systems, january, conference, relatedness, indexing, university, proceedings, information, instruct ions, rivlin, wolfman, concept, revisited, context, search, placing, matias, ruppin, semantic, indiv idual, assigned, resources, transactions, algorithms, numbers, although, floating, column, estimate, agirre, thesis, lexical |
| (SLD : ac.il) |
|
| zum Seitenanfang ↑ |
| RISE: Repository of Information Sources used in information Extraction tasks. |
| RISE: Repository of information sources used in information extraction tasks (learning extraction rules / extraction patterns). |
| http://www.isi.edu/info-agents/RISE/ |
| Repository of online information sources: test domains for information extraction and wrapper genera tion tools that learn extraction rules (extraction patterns). |
| Repository of online information sources: test domains for information extraction and wrapper genera tion tools that learn extraction rules (extraction patterns). |
| Information Extraction, information extraction, IE, Wrapper Induction, wrapper induction, WG, WI, ex traction rules, extraction patterns, extraction rule, extraction pattern, wrapper generation, reposi tory, repositories, information source, information sources, information, extraction, wrapper, wrapp ers, induction, rule, pattern, rules, patterns, source, sources |
| extraction, information, sources, repository, information, learning, repository, patterns, online, c ommunities, please, interested, generate, algorithms, sources, referring, format, reference, agents, obtain, replicate, following, pseudo, suggest, experiments, others, sciences, machine, learning, ir vine, modified, muslea, questions, comments, suggestions, institute, california, southern, created, excellent, document, acknowledgment, university, datasets, survey, download, corporate, acquisitions , seminars, generation, wrapper |
isi.edu - rank der domain 105903 (40240 in US)
|
|
| zum Seitenanfang ↑ |
| Reuters-21578 Text Categorization Corpus |
| Reuters-21578 Text Categorization Test Collection |
| http://www.daviddlewis.com/resources/testcollections/reuters21578/ |
| A classic benchmark for text categorization algorithms. |
| |
| |
| reuters, collection, available, archive, including, currently, categorization, research, various, re uters, active, longer, locations, canberra, reuters21578, countries, prolog, resource, contributed, feldman, return, modest, requirements, contact, useful, prepared, previous, resources, researchers, collected, labeled, originally, carnegie, developing, construe, course, widely, collections, collect ion, categorization, superceded, likely, though, system, further, uncompressed, version, identical, bandwidth, encourage, versions |
| (SLD : daviddlewis.com) |
|
| zum Seitenanfang ↑ |
| Bilkent University Function Approximation Repository |
| Bilkent University Function Approximation Repository |
| http://funapp.cs.bilkent.edu.tr/DataSets/ |
| Datasets used for the experimental analysis of function approximation techniques and for training an d demonstration by machine learning and statistics community. |
| |
| |
| features, description, missing, target, values, feature, original, source, categorical, categorical, university, bilkent, temperature, function, approximation, repository, survival, earthquake, choles trol, college, computer, activity, breast, number, electricity, weight, weather, guvenir, weight, ap proximation, function, mortgage, poverty, 2dplanes, plastic, pressure, telecomm, kinematics, mortali ty, pollution, estimate, magnitude, normal, average, northridge, pharynx, conventional, employment, longley, hearth, read93 |
| (SLD : edu.tr) |
|
| zum Seitenanfang ↑ |
| TechTC - Technion Repository of Text Categorization Datasets |
| TechTC - Technion Repository of Text Categorization Datasets |
| http://techtc.cs.technion.ac.il |
| Provides a large number of diverse test collections for use in text categorization research. |
| Text categorization test collections. |
| Computational Linguistics,
Natural Language Processing, NLP,
Natural Language Understanding, N atural Language Analysis,
Natural Language Generation, Information Retrieval, IR,
Text Categor ization, Artificial Intelligence, AI,
Machine Learning, Corpus Linguistics, Algorithm Design,
Text Mining, Text Data Mining |
| datasets, feature, documents, dataset, subdoc, document, categorization, number, filtering, format, contains, collections, categories, techtc, directory, feature, vectors, categorization, acquisition, features, evgeniy, example, gabrilovich, corresponding, directory, directories, following, performe d, collection, datasets, reuters, technion, available, processing, conference, through, preprocessin g, procedure, labeled, abstract, davidov, between, please, markovitch, experiments, international, c ollected, category, classification, certain, structure |
| (SLD : ac.il) |
|
| zum Seitenanfang ↑ |
|