Verification of schema.org annotations
Introduction
The general verification of schema.org annotations is defined as the process of checking if a given schema.org annotation in JSON-LD format is in compliance with the following specifications:
JSON
JSON-LD
Schema.org
Additionally, restrictions are added/softened depending on real-world practices and usages (e.g. schema.org's pragmatic view on conformance, Google's structured data testing tool)
The output of the verification process is provided as an error report in a structured way (JSON-LD) with a specific data model (GeneralValidationReport). Any abnormalities are expressed through corresponding error entries. The documentation file "BasicValidation.md" provides the error list for JSON and JSON-LD specific errors. In the following the error list for schema.org is provided.
Error List for General Verification of schema.org annotations
Error Codes for the General Verification of schema.org annotations start with 3
ErrorCode
Name
Severity
Description
300
Generic schema.org Verification error
Any
Can be used as super-type for any error regarding the general validation
301
Non-conform @context
Error
Used @context must use/include schema.org
302
Non-conform @type
Error
Used @type is non-conform to schema.org
303
Non-conform property
Error
Used property is non-conform to schema.org
304
Wrongly formatted action property
Error
Used action property (input-/output-) has a value that is not a string
305
Non-conform domain
Error
Used property has a domain that is not allowed to use according to schema.org
306
Non-conform range
Error
Used property has a range that is not allowed to use according to schema.org
307
Unexpected string
Warning
Used property has a string as value, although it is not allowed according to schema.org
308
Wrongly formatted enumeration
Warning
Used property has an enumeration value that is non-conform to schema.org (must be a URL stated as enumeration value)
309
Empty entity
Warning
Used property has a range that is an entity with no properties
Misc
Context
The JSON-LD @Context allows to define a certain semantic in a lot of different syntactic ways (see JSON-LD spec), however, our algorithm expects/allows a certain sub-set of @context based on the overall practices of schema.org annotations. Independent of the different syntactic variations, the most important takeaway is to use the right namespace for schema.org
Schema.org namespace
The following http://schema.org variants are possible for the namespace (according to examples of schema.org and google):
However, there is a set of recommended variants that we agreed based on an open dicussion at the sdo-check issues page:
Context variants
The following variations are allowed by our verification algorithm, where the first one is the recommended one.
1. Single default @context
There is only 1 vocabulary used and defined as a simple string value. This is the most common seen notation variant.
2. Single default @context using @vocab
Like the first variant, but using the "@vocab" keyword to define the standard vocabulary.
3. Single-Termed Context
A single term that defines the schema.org vocabulary. It is used as shorthand for the absolute URI part without the vocabulary term identifier -> "schema:Person" instead of "http://schema.org/Person".
4. Multiple Vocabularies, default using @vocab
The "@vocab" keyword defines the standard vocabulary (terms without vocabulary indicator), and additional vocabularies are defined with specific vocabulary indicators. The standard @vocab should be schema.org
5. Multi-Termed Context
Multiple vocabularies that are defined by specific terms.
@graph
It is allowed to use a @graph keyword that "wraps" the annotation, but every node in this graph is seen as a tree (inner nodes) instead of loosely nodes (entities in the graph connected by URIs).
Typed values
Typed values are those literals expressed by an object having @type of the data type and @value for the actual value. It is very useful to define a specific data-type, but unfortunately not embraced in practice, and not supported by google (checked 10.2019). So, our algorithm does not allow this syntax. (Example: 1. link is typed value. 2. link is how it is used in practise).
Our algorithm could show a warning for typed values.
Enumerations
The enumerations model of schema.org is very unclear. It is still to discuss/figure out, what is allowed/recommended in regards of enumerations. Basically, they could be accepted as Entities (e.g. CreditCard) or URIs (e.g. Monday).
Actions
schema.org documentation about actions: https://schema.org/docs/actions.html
Action entities may have "action"-specific input and output properties. Those may have as value a String or a PropertyValueSpecification-typed object.
Discussion
Further discussion about the verification of schema.org vocabulary can be found in the sdo-check repository, which is managed by us.
Last updated