This article is to introduce you to IBM's development strategy, ECM exactly what kind of product and technical capabilities provided in the practice of the user's business can provide how much experience sharing. IBM Software Group had put forward the wisdom of "soft" power at its previous 2012 Strategy Launch and claimed that it expects this view to grow and innovate for the enterprise through software technology. And as part of IBM's proposed "soft" strength, the new ECM product will be available globally on May 31st, ICA (content analysis), ICC (content Capture Acquisition) and ICN (Document, Imaging, Social Content Management).
ICA (IBM Content Analytics) - Content Analytics
The explosive growth of unstructured data has dispersed valuable information in isolated islands. These unstructured information includes office documents, video and audio, HTML pages, e-mail, text, reports, etc., which are often placed in IT systems such as corporate databases, file systems, websites and portals. Susan Chen, head of development at ECM Lab in Southern California, believes that "more than 80% of the data used in business operations is unstructured and its growth rate is double that of structured data. With 200 billion emails, plus images, office documents, audio and video files, etc., we can see huge amounts of data, and if we can make effective use of massive amounts of data, we can detect problems early, improve customer service and reduce operations Cost and unlock new revenue opportunities. "Susan Chen On May 31 - World No Tobacco Day, the IBM ECM Users Conference presented an example of unstructured data on the risk of smoking, she said , "We made a POC for healthcare customers (evidence for the point of view) and we used ICA to analyze 5,000 patient records, one of which was to find out the patient's smoking habits to determine the risk of having a heart attack, Of the data, 35% of the records found smoking targets; 81% of the data found smoking indicators in the text data, and the accuracy is high So if you analyze only the source data you may miss some important insights. "The ICA mentioned by Susan Chen is one of the" protagonists "of the IBM ECM solution, the IBM Content Analytics Management product, the full name IBM Content Analytics .
"The data objects for enterprise content analytics are evolving and the shift from transactional, structured data to interactive, unstructured data has become a trend." Susan Chen said ICA enables enterprise-level search and text analysis of heterogeneous data sources. The author understands that ICA works by inputting captured data into the UIMA pipeline (an assembly architecture for analyzing unstructured content) through analytic search, in conjunction with an annotator written in accordance with UIMA's open standards for content analysis. The generated data is added to the indexed directory, where users can then selectively mine information and obtain conclusions based on the ICA's analytics interface for accurate business decision making. ICA search results will be what kind of interface? As shown in the figure below, these eight examples are representative views of ICA analysis conclusions.
ICA Search and Analysis Features Overview (click to enlarge)
ICA provides three system configuration methods to meet different data needs. "Small data can be deployed on a single workstation, for example, as a POC; in a production system, one or more service stations can be deployed and BigInsights can be used for analysis with a large amount of data." Susan Chen said, "In normal jobs, Support 50 to 100 million data files, the third version of ICA provides a big data-oriented configuration.ICA and BigInsights integrated architecture, the design is focused on the part of the system that consumes the most computing resources, that is, document processing Pre-preparation, content analysis, indexing, and global analysis are distributed on a cheaper array of machines via Hadoop and Map Reduce models, a group of inexpensive machines equating to a computing cloud. "
Seamless Integration with BigInsights High Scalability (click to enlarge)
The figure below is a highlight of more than 30 heterogeneous data sources ICA can support. According to Susan Chen, these data sources represent more than 150 different forms. It is worth mentioning that, in addition to supporting these own data sources in addition to IBM's own products, Oracle 11g, Microsoft SQl Server, Sybase series are among them; Including EMC, CA Technologies and other manufacturers of content management products can also be captured by the ICA collection The data source. How is this achieved? The author had the opportunity to interview Mr. William Lobig, Project Director of ECM Development for IBM Software Group Industry Solutions and received some answers.
More than 30 heterogeneous data sources supported by ICA (click to enlarge)
ICC (IBM Content Collector) - Content capture collection
To Microsoft's enterprise-level collaborative portal SharePoint as a data source example, IBM's content management solutions through what kind of technology to achieve the data source capture? This is related to ICC (Content Capture Acquisition), another IBM ECM product, William Lobig said. "The ICC product in the ECM solution provides such a connector that modularly resides in the ECM solution." Such a Connect connector , Can connect to different data sources.Specific on how to capture and capture data in SharePoint, if you want to convert it to IBM's ECM inside, you can take advantage of a pointer-like function, when you need the content library can call SharePoint, This is done on demand, which is IBM's own solution, except that we use Microsoft's public APIs. "The picture below is a summary of the latest version of ICC 3.0.