For our work on machine learning for the annotation of web services we have gathered WSDL files from salcentral and XMethods and organized them in a hierarchy.
The web services are hierarchically classified. The directory structure serves as the label, i.e. a wsdl file in the communication\mail directory was classified as a "mail" webservice, where "mail" is a subclass of "communication".
The labeled instances were crawled from the SALCentral website, the unlabeled instances (in directory "unlabelled") are from the xmethods website.
Each .wsdl file is accompanied by a .txt file with the following structure:
Note that the SALCentral classification is not very useful (that's why we wanted to have our own...)
The filenames are serviceNN.OriginalClassification.[txt|wsdl], where OriginalClassification refers to the label assigned by SALCentral. The XMethods web site does not categorize the web services, therefore line 2 in the .txt files for the unlabeled instances is always "XMethods".
The classes are highly unbalanced and unfortunately not noise-free.