Meta Data
Metadata, or the data about data in an interesting topic in data science and information retrieval. Metadata originated long before computers came into existence. Let’s start with how libraries used Meta data. Libraries had a catalog of their books. But they also had more information about the book along with it like the author, publication, year, etc. This additional piece of information is over and above the content of the book. Hence this is referred to as Metadata.
Now we will focus our attention on how metadata is used on the web. There are few ways in which metadata can be stored inside the webpage. The metadata of a webpage can be classified into 2 things. General information, like title, description, keywords, etc which is just some form of text information about the web page. They are typically put on the HTML page using the tag called meta. Other than specialized use cases like charset, which has its own semantics, the general meta tag has a name and a content attribute.
Just like
<meta name=”keywords” content=”HTML,CSS,XML,JavaScript”>
These meta tags were used to give additional information for the machines, like the editor, browser, or search engines. But this didn’t have any structure or the structure was loosely defined by these tools.
Then was born structured data. The purpose of them is to not add value to the viewer of the page, but to the search engines and other semantic processors. Primarily pushed forward by Google. You can read more about structured data and various formats supported on this google page.
In short, Google supports 3 formats,
Google’s recommended format is JSON-LD, which is a JavaScript notation embedded in a <script> tag in the page head or body. The markup is not interleaved with the user-visible text, which makes nested data items easier to express, such as the Country of a PostalAddress of a MusicVenue of an Event. Also, Google can read JSON-LD data when it is dynamically injected into the page’s contents, such as by JavaScript code or embedded widgets in your content management system.
There are many entities defined in schema.org like articles. They can be defined on the page and search engines can understand your page better. The rating widget you see on the right, which allows you to rate this article, is the visible component of this structure. But Google cannot read the fancy stars we have drawn to figure the ratings. Hence the same data is also represented in the JSON-ld format on this page, so it can read the ratings given by our users.