While creating an avro schema to extract the CirrusSearch data out of elasticsearch we identified a couple inconsistencies that should probably be fixed at CirrusSearch level:
- defaultsort should be null when not set instead of being false
- coordinates should be always be floats (not ints)
- file_text should be a string, not an empty array nor false
- labels: should be null or an empty map instead of an empty array
- descriptions should be null or an empty map instead of an empty array
- descriptions should always be a map<string, array<string>> and not an map<string,string> on wikidata
AC:
- CirrusSearch produces a document that can be validated against the provided avro schema