CiteSeerX Data
CiteSeerx data and metadata are available for others. Data available includes CiteSeerx metadata, databases, and data sets and text of pdf files.
For more information, please contact us directly. Currently, data is only available through rsync transfers and by downloads from Amazon s3. Please contact us for more information. Data released by CiteSeerx is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
CiteSeerx is compliant with the Open Archives Initiative Protocol for Metadata Harvesting, which is a standard proposed by The Open Archive Initiative in order to facilitate content dissemination. For data not mentioned here, please contact us through feedback.
To browse or download records programmatically from CiteSeerx OAI collection please use the harvest url:
http://citeseerx.ist.psu.edu/oai2
The archive may also be browsed from an interface via an OAI Repository Explorer, either by using the CiteSeerx archive identifier or by directly entering the harvest url.
Currently, there are difficulties with the OAI. If you have an immediate need of our data, please contact us for the Amazon access.
These toolkits can be used for OAI metadata harvesting:
- OAI-Harvester - perl
- OAIHarvester2 - Java
- .NET OAI Harvester - .NET (dll)
- UIUC OAI - UIUC OAI Metadata Harvesting Project.