Data in WELD
I use the term data rather than content since I am referring to not only content, the content lifecycle, content metadata, but also information that is essential for the enterprise operations. This is not always content like marketing. It could be database tables, noSQL key-values or any other information that must be created, modified, and deprecated as part of the content lifecycle. Operations are performed by enterprise systems on this data during its lifecycle and if that is all kept it has a profound effect on the localization process.
The Content Lifecycle
Creation
Choosing what content is essential for your enterprise and your customers is an essential task. If you plan on selling or expanding internationally an even more important decision is what should be localized. You must think about the creation process as one that exponentially increases the content you create, modify, manage, and deprecate. This of course has an effect on the size of your data storage, the size of your workforce, the work that they do, and the time you spend maintaining data.
This is also the source of most of your localized content. A strong connection between tech writers, copywriters, creatives, and the localization teams is essential. If the writers don’t understand how important consistency is in this process you will be chasing source issues throughout the localization process.
Modification
The modification of data has 5 distinct but intertwined tasks. 1. Create Content 2. Update content 3. customize localized content. 4. Deprecate content 5. Track the changes and the repercussions. The decision on content will need to be made for each locale. And of course all the decisions and processes of this work needs to be documented.
Update
When an enterprise’s policies, processes, or T&Cs change there are updates that need to be made. These changes may not be applicable, desirable, or legal in each locale. So every modification must be evaluated and decisions about what will be localized must be made. Then the work must be planned, executed, and tracked.
Customize
Customizations happen when material is updated differently for each locale. For example rules around customer data, credit, or delivery may mandate differences in each locale that are not in the source. Again all of this should be tracked to avoid confusion or unwanted changes to the content. And preferably it should be addressed at the code level rather than the content level. Depending on the content to manage colors, pricing, currency, date/time issues is a fool’s errand. Use ICU and integrate this as part of the internationalization process.
Deprecation
After a certain period the content is no longer useful and must be deprecated.
But behind this content lies a lot of metadata that must be parsed for scaling localization to the Whole Enterprise. The metadata, and processes used to create, store, vend, and capture content is just as important to the enterprise as what the customer sees on the website. And of course there is a lot of SEO value in the longer lived content that others may have had time to link to and increase the value of.
Tracking
Tracking can be a purely manual process (this breaks at scale very fast so choose another more robust method) or it can be automated. Depending on how you create and utilize content this may be built into your process. CMS systems usually provide elements and attributes if they are structured authoring, or key/values to track content that is exported or round tripped for localization. There is usually a version control mechanism or some other way of tracking the data, but if you are depending on a 200+ character key without context you should make sure there is a way for a user to understand where the content appears and how it is used.
Data
The data that goes along with the content might be more appropriately referred to as metadata. It is really data about the data that is carried with the content itself or as an ancillary set of data that has an audit trail of the content. Metadata can often include: authoring, translator, translation process, vendor, tools, MT engine, post-edit process, comments, deprecation timeline, subject, domain, etc.
Version control
Version control often refers to codebase changes in software, but in the context of a CMS or a structured authoring environment, the version control system will allow you to roll back changes and track updates efficiently.
Source locale and restricted vocabularies
Having a source locale is great for simplifying the localization and update process. You may even create a restricted vocabulary (Boeing English for instance) to help streamline the localization process and create consistency. But over time every business comes to realize their service or product really isn’t the same in every locale. Localization is a prism and business, language, product, and everything else changes across markets over time. Gift cards may not be used in the same way, shipping, taxes, currencies, pricing, etc. are all different. And thus there will be market or country specific customizations that need to be tracked, and reviewed at intervals.
And the localization team will have to help the business understand the implications of depending on locales for business logic. For example an AZERTY keyboard if it is in French Canadian, but a Qwerty if it is Canadian English is a business strategy that will quickly break down.
CMS data
Where is the content housed? Is it part of a structured authoring environment that uses the same content across mobile, web, and print? Or is it content created specifically for one use. What transformations occur on the content? What is the encoding of the CMS data or the backend databases?
Transformation Data
What was done to the data and images to get it to fit the layout? Where are the original images? Are the images using transparencies for text overlays or will there have to be new images created for localization? Is there alt text for accessibility? Is that data sent for localization? How is the data sent?
Publication data
When or where was it used? Will the content cycle, stay onsite forever? Was it part of a campaign? Is it seasonal? Is it a custom campaign? When does the content expire?
The Content Mandates the metadata and the process
In a lot of cases if you know the use you will know what metadata, processes, and types of content you need.
Ecommerce- Just the facts Ma’am
Ecommerce data is not always completely translated, is rarely fluid or complete, but it does live for a long time. It is the description of the service or product that allows the customer to make an informed buying decision. Though some companies depend on storytelling for the marketing portion of this content the majority is factual rather than marketing copy.
Help data is usually knowledge base or factual data. Accuracy is important but many companies choose to MT the content and provide English and the localized text. They may hire loc agencies to improve the copy, they may ask for customer contributions like open source projects, or they may not provide any localization of the content. Each decision comes with its own risks, rewards, and requirements.
Social- The shoe, the gourd, and last night’s dinner
Social content is like Monty Python or a bazaar. Everyone has a story to tell, a product to sell, and an image they want you to believe, like, or meme. There is great value in the data and the connections for advertisers and marketers, but maybe not the participants. At its core the value of social content is ephemeral. One might disagree if he can’t get that photo of public drunkeness from 1985 taken down from a site, but generally the content is individual, time-sensitive, and rarely essential for a large user base. This content is often MTed, or not localized.
The gist of this post is data and metadata are essential to understanding your content’s value and lifecycle. And that content is essential to the value proposition for your business. If you can’t articulate the value, process, or decisions you make you cannot effectively manage, invest in, or deprecate the data you create. That is a real problem in one language but in 45 or 195 locales it is a threat to the sustainability of your business. And yet without localization as the engine for international growth your company is relegated to a regional player in the industry they may have created.