Design Decisions

Templating

Why put the logic to generate indexing data in TBBs?

Other search integrations I have seen have a lot of the logic to build the indexing data in the Deployer element (deployer or storage extension). SI4T has all of the logic in TBBs for two main reasons:

Easy to extend - most Tridion developers are familiar with developing TBBs with .NET, however not many are comfortable working with Content Delivery extensions in Java. As the logic for generating index data is likely to vary from project to project, we want this to be as easy to extend/customize as possible - in fact its as easy as creating a TBB which builds some XML and puts it in the package.
Easy to configure - using template metadata and TBB parameters, we can easily configure default and template specific field processing logic which is applied in templates to vary the index data generated. Enabling this level of flexibility in a Content Delivery extension would require heavy use of configuration files, updates to which would be more complex than simply adding/editing template metadata or template building block parameters.

Why separate the indexing logic into two separate TBBs?

This enables you to create your own TBBs to generate the indexing data XML, but reuse the TBB which adds this into the rendered output.

Why use template metadata/templating parameters to configure indexing behaviour?

The decision on what to index typically is on a template level, rather than schema or component. Imagine you have a landing page, which has a summary component presentations of articles, which link through to full article pages. If someone searches for a term which matches text in a given article, we most likely want to show that article page only in the search results, and not the landing page. As such we would set the summary CT not to be indexed, or maybe perhaps limit the fields indexed for the summary CT to be just the summary and title.

Content Delivery

Why use storage extensions and not deployer extensions?

When the content is published we want to extract the indexing data from it and deploy the remaining content as normal. Storage gives us direct access to objects containing the content to be stored, along with the opportunity to easily modify it. With a deployer extension, you need to manually manipulate the deployment package, which requires more effort. Further with deployer extensions, there is no Binary UnDeploy, so there is no way of actually telling when a binary is unpublished. With storage, you can hook into the moment that a binary is removed - which will be important if you are, for example indexing PDF documents.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design Decisions

Templating

Why put the logic to generate indexing data in TBBs?

Why separate the indexing logic into two separate TBBs?

Why use template metadata/templating parameters to configure indexing behaviour?

Content Delivery

Why use storage extensions and not deployer extensions?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally