Apache Avro is a data serialization framework. Its pretty popular for systems using Kafka, with a strict schema. Avro enforces a rather strict schema.
Where most people struggle is to write test cases and validate whether a given json message actually conforms to a given Avro schema. There is no in-built API in Avro to do that. It is a bit complicated. While, theoretically, it is possible to generate Java classes from an Avro schema, those classes are not POJOs, but they are a part of the Avro IDL (Interface Definition Language), and they extend the SpecificRecordBase class. It is more like the WSDL files of the good old days, where we used to generate some very cumbersome structures that conformed to the convoluted schemas of whatever we were trying to ship in the envelop.
In essence, if you have a simple json message that you want to read into a Java POJO that is a SpecificBaseRecord, thats not happening! Conversely, if you have a POJO that has the same structure as the Avro schema, and you are trying to write it into an Avro message with the usual example, you are in for a shock!
The below code fails spectacularly:
@Test
void pojoWithSpecificDatumWriterThrowsException() throws IOException, URISyntaxException {
// given
Schema bookSchema =
new SchemaParser().parse(AvroDemoTest.class.getResourceAsStream("/expected/book.avsc")).mainSchema();
Book book = Instancio.create(Book.class);
Path bookAvroFileLocation = Path.of("build", "book.avro");
// when
DatumWriter<Book> datumWriter = new SpecificDatumWriter<>();
@SuppressWarnings("resource")
DataFileWriter<Book> dataFileWriter = new DataFileWriter<>(datumWriter);
dataFileWriter.create(bookSchema, bookAvroFileLocation.toFile());
AppendWriteException result = assertThrows(AppendWriteException.class, () -> dataFileWriter.append(book));
// then
assertTrue(result.getMessage().startsWith("java.lang.ClassCastException: value Book"));
}
And the error is so cryptic! Basically what its trying to tell you is that the Book class should be extending the SpecificBaseRecord.
So, whats the way out?
Simple: instead of the SpecificDatumWriter, use the ReflectDatumWriter.
This is the complete example:
@Test
void validatePojoAgainstSchema() throws IOException, URISyntaxException {
// given
Schema bookSchema =
new SchemaParser().parse(AvroDemoTest.class.getResourceAsStream("/expected/book.avsc")).mainSchema();
List<Book> booksInput = Instancio.createList(Book.class);
Path bookAvroFileLocation = Path.of("build", "book.avro");
// when
DatumWriter<Book> datumWriter = new ReflectDatumWriter<>();
DataFileWriter<Book> dataFileWriter = new DataFileWriter<>(datumWriter);
dataFileWriter.create(bookSchema, bookAvroFileLocation.toFile());
booksInput.forEach(book -> {
try {
dataFileWriter.append(book);
} catch (IOException e) {
throw new RuntimeException(e);
}
});
dataFileWriter.close();
// then
DatumReader<Book> datumReader = new ReflectDatumReader<>(Book.class);
DataFileReader<Book> dataFileReader = new DataFileReader<>(bookAvroFileLocation.toFile(), datumReader);
List<Book> booksOutput = new ArrayList<>();
while (dataFileReader.hasNext()) {
booksOutput.add(dataFileReader.next());
}
dataFileReader.close();
assertEquals(booksInput, booksOutput);
}
These are the steps:
The complete sources and a working project can be found in the avro-demo.