Verification of schema.org annotations
Last updated
Last updated
The general verification of schema.org annotations is defined as the process of checking if a given schema.org annotation in JSON-LD format is in compliance with the following specifications:
JSON
JSON-LD
Schema.org
Additionally, restrictions are added/softened depending on real-world practices and usages (e.g. schema.org's pragmatic view on conformance, Google's structured data testing tool)
The output of the verification process is provided as an error report in a structured way (JSON) with a specific data model. Any abnormalities are expressed through corresponding error entries. The file BasicValidation.md provides the error list for JSON and JSON-LD specific errors. In the following the error list for schema.org is provided.
Error Codes for the General Verification of schema.org annotations start with 3
The JSON-LD @Context allows to define a certain semantic in a lot of different syntactic ways (see JSON-LD spec), however, our algorithm expects/allows a certain sub-set of @context based on the overall practices of schema.org annotations. Independent of the different syntactic variations, the most important takeaway is to use the right namespace for schema.org
The following http://schema.org variants are possible for the namespace (according to examples of schema.org and google):
However, there is a set of recommended variants that we agreed based on an open dicussion at the sdo-check issues page:
The following variations are allowed by our verification algorithm, where the first one is the recommended one.
1. Single default @context
There is only 1 vocabulary used and defined as a simple string value. This is the most common seen notation variant.
2. Single default @context using @vocab
Like the first variant, but using the "@vocab" keyword to define the standard vocabulary.
3. Single-Termed Context
A single term that defines the schema.org vocabulary. It is used as shorthand for the absolute URI part without the vocabulary term identifier -> "schema:Person" instead of "http://schema.org/Person".
4. Multiple Vocabularies, default using @vocab
The "@vocab" keyword defines the standard vocabulary (terms without vocabulary indicator), and additional vocabularies are defined with specific vocabulary indicators. The standard @vocab should be schema.org
5. Multi-Termed Context
Multiple vocabularies that are defined by specific terms.
It is allowed to use a @graph keyword that "wraps" the annotation, but every node in this graph is seen as a tree (inner nodes) instead of loosely nodes (entities in the graph connected by URIs).
Typed values are those literals expressed by an object having @type of the data type and @value for the actual value. It is very useful to define a specific data-type, but unfortunately not embraced in practice, and not supported by google (checked 10.2019). So, our algorithm does not allow this syntax. (Example: 1. link is typed value. 2. link is how it is used in practise).
Our algorithm could show a warning for typed values.
The enumerations model of schema.org is very unclear. It is still to discuss/figure out, what is allowed/recommended in regards of enumerations. Basically, they could be accepted as Entities (e.g. CreditCard) or URIs (e.g. Monday).
schema.org documentation about actions: https://schema.org/docs/actions.html
Action entities may have "action"-specific input and output properties. Those may have as value a String or a PropertyValueSpecification-typed object.
Further discussion about the verification of schema.org vocabulary can be found in the sdo-check repository, which is managed by us.
ErrorCode
Name
Severity
Description
300
Generic schema.org Verification error
Any
Can be used as super-type for any error regarding the general validation
301
Non-conform @context
Error
Used @context must use/include schema.org
302
Non-conform @type
Error
Used @type is non-conform to schema.org
303
Non-conform property
Error
Used property is non-conform to schema.org
304
Wrongly formatted action property
Error
Used action property (input-/output-) has a value that is not a string
305
Non-conform domain
Error
Used property has a domain that is not allowed to use according to schema.org
306
Non-conform range
Error
Used property has a range that is not allowed to use according to schema.org
307
Unexpected string
Warning
Used property has a string as value, although it is not allowed according to schema.org
308
Wrongly formatted enumeration
Warning
Used property has an enumeration value that is non-conform to schema.org (must be a URL stated as enumeration value)
309
Empty entity
Warning
Used property has a range that is an entity with no properties