Meta data, or the data about data in an interesting topic in data science and information retrieval.  Meta data originated long before computers came into existence.  Lets start with how libraries used Meta data.  Libraries had a catalog of their books.  But they also had more information about the book along with it like the author, publication, year, etc.  This additional piece of information is over and above the content of the book.  Hence this is referred to as Meta data.

meta data

library catelog meta data

Now we will focus our attention on how meta data is used in web.   There are few ways in which meta data can be stored inside the webpage.  Meta data of a webpage can be classified into 2 things.  General information, like title, description, keywords etc which are just some form of text information about the web page.  They are typically put in the html page using the tag called meta.   Other than specialized use case like charset, which has its own semantics, the general meta tag has a name and a content attribute.

Just like

<meta name=”keywords” content=”HTML,CSS,XML,JavaScript”>

This meta tags were used to give additional information for the machines, like the editor, browser or the search engines.  But this didnt have any structure or the structure was loosely defined by these tools.

Then was born structured data.  Purpose of them is to not add value to the viewer of the page, but to the search engines and other semantic processors.  Primarily pushed forward by Google.   You can read more about structured data and various formats supported on this google page.

In short, Google supports 3 formats,

  1. JSON-LD
  2. Microdata
  3. RDFa

Googles recommended format is JSON-LD, which is a JavaScript notation embedded in a <script> tag in the page head or body. The markup is not interleaved with the user-visible text, which makes nested data items easier to express, such as the Country of a PostalAddress of a MusicVenue of an Event. Also, Google can read JSON-LD data when it is dynamically injected into the page’s contents, such as by JavaScript code or embedded widgets in your content management system.

There are many entities defined in schema.org like article. They can be defined in the page and search engines can understand your page better.  The rating widget you see on the right, which allows you to rate this article, is the visible component of this structure.  But Google cannot read the fancy stars we have drawn to figure the ratings.  Hence the same data is also represented in the json-ld format on this page, so it can read the ratings given by our users.

 

Categories: meta data