Any23 can be used:
Any23 is composed of the following modules:
The command-line tools are provided by the any23-core module.
Once Any23 has been correctly installed, if you want to use it as a commandline tool, use the shell scripts within the "any23-core/bin" directory. These are provided both for Unix (Linux/OSX).
The main script is "any23tools" which provides analysis, documentation, testing and debugging utilities.
Simply running ./any23tools without options will show the default configuration properties and the usage options. The resource (URL or local file) is the only mandatory argument. It is possible also to specify input format, output format and other advanced options.
any23-core/bin$ ./any23tools
[...configuration data...]
Usage: ToolRunner <utility> [options...]
where <utility> one of:
Eval Utility for processing output log.
ExtractorDocumentation Utility for obtaining documentation about metadata extractors.
MicrodataParser Commandline Tool for extracting Microdata from file/HTTP source.
PluginVerifier Utility for plugin management verification.
Rover Any23 Command Line Tool.
Version Prints out the current library version and configuration information.
VocabPrinter Prints out the RDF Schema of the vocabularies used by Any23.
The any23tools script detects a list of available utilities within the any23-core classpath and allows to activate them.
Such utilities are:
Rover is the main extraction tool. It allows to extract metadata from local and remote (HTTP) resources, specify a custom list of extractors, specify the desired output format and other flags to suppress noise and generate advanced reports.
any23-core/bin$ any23tools Rover
[...configuration data...]
usage: {<url>|<file>} [-e <arg>] [-f <arg>] [-l <arg>] [-n] [-o <arg>]
[-p] [-s] [-t] [-v]
-e <arg> comma-separated list of extractors, e.g.
rdf-xml,rdf-turtle
-f,--Output format <arg> [turtle (default), ntriples, rdfxml, quad,
uris]
-l,--log <arg> logging, please specify a file
-n,--nesting disable production of nesting triples
-o,--output <arg> ouput file (defaults to stdout)
-p,--pedantic validates and fixes HTML content detecting
commons issues
-s,--stats print out statistics of Any23
-t,--notrivial filter trivial statements
-v,--verbose show progress and debug informationExtract metadata from an HTML page:
any23-core/bin$ ./any23tools Rover http://yourdomain/yourfile
Extract metadata from a local resource:
any23-core/bin$ ./any23tools Rover myfoaf.rdf
Specify the output format, use the option "-f" or "--format": (Default output format is TURTLE).
any23-core/bin$ ./any23tools Rover -f quad myfoaf.rdf
Filtering trivial statements
By default, Any23 will extract HTML/head meta information, such as links to CSS stylesheets or meta information like the author or the software used to create the html. Hence, if the user is only interested in the structured content from the HTML/body tag we offer a filter functionality, activated by the "-t" command line argument.
any23-core/bin$ ./any23tools Rover -t -f quad myfoaf.rdf
The ExtractorDocumentation returns human readable information about the registered extractors.
any23-core/bin$ ./any23tools ExtractorDocumentation
[...configuration data...]
Usage:
ExtractorDocumentation -list
shows the names of all available extractors
ExtractorDocumentation -i extractor-name
shows example input for the given extractor
ExtractorDocumentation -o extractor-name
shows example output for the given extractor
ExtractorDocumentation -all
shows a report about all available extractorsList all the available extractors:
any23-core/bin$ ./any23tools ExtractorDocumentation -list [...configuration data...] csv html-head-icbm html-head-links html-head-title html-mf-adr html-mf-geo html-mf-hcalendar html-mf-hcard html-mf-hlisting html-mf-hrecipe html-mf-hresume html-mf-hreview html-mf-license html-mf-species html-mf-xfn html-microdata html-rdfa html-script-turtle rdf-nq rdf-nt rdf-turtle rdf-xml
The MicrodataParser tool allows to apply the only MicrodataExtractor on a specific input source and returns the extracted data in the JSON format declared in the Microdata specification section JSON.
bin/any23tools MicrodataParser
Usage: {http://path/to/resource.html|file:/path/to/local.file}Any23 provides a Web Service that can be used to extract RDF from Web documents. Any23 services can be accessed through a RESTful API.
Running the server
The server command line tool is defined within the any23-service module. Run the "any23server" script
any23-service/bin$ ./any23server
from the command line in order to start up the server, then go to to access the web interface. A live demo version of such service is running at . You can also start the server from Java by running the Any23 Servlet class. Maven can be used to create a WAR file for deployment into an existing servlet container such as Apache Tomcat.
See our Developers guide for more details.