AI Application Development Combat

Source: Internet
Author: User
Tags array length wrapper python script
Extended handwritten numeral recognition application recognize and calculate simple handwritten mathematical expressions Key points of knowledge
    • Understanding Mnist Datasets
    • Learn how to extend a dataset
    • Implementing the Handwriting Calculation calculator
Brief introduction

This article presents a case study of an AI application that supports the recognition and calculation of handwritten mathematical expressions. The application of this article is based on the implementation of the basic application of "Getting Started with handwriting recognition application" in the previous article. This article will demonstrate the process of basic data collation and expansion of AI models, as well as how to use the handwriting input features to simplified characters character segmentation. And this article will demonstrate how to use visual Studio Tools for AI to make batch inference, so that the inference acceleration can be achieved using the parallel computation of the underlying AI framework. In addition, the main code logic of this application is analyzed and explained.

Background

In "Getting Started with handwriting recognition," We introduced AI applications that recognize a single handwritten letter, based on a mnist dataset, and in our several trials, the application performed well and was able to accurately recognize handwritten digital graphics as corresponding numbers. So, does the app recognize more kinds of handwritten characters, or even multiple characters at the same time? There are a number of such situations, such as the common mathematical expressions in life (Shape 1+2x3 ). Such a complex situation is more common and more practical. In contrast, if one recognition can only be a handwritten number, the application will have a large limitation.

First, we can try a few characters at the same time the most basic special case of this kind of situation, that is, two numbers at a time. Start the app built in the handwritten numerals recognition blog and write down two numbers at a time in your existing app to see how the recognition works (to make it easier to write and show results, we'll adjust the width of the strokes in the previous example from 40 to 20. It is possible to experience that this change has no significant effect on the identification of individual numbers):

Was the result of an experiment. After many experiments, we see that the existing application has a poor recognition effect on two numbers.

As shown, the results displayed in the upper right corner of the application window accurately reflect the model's reasoning for our handwriting input (i.e. result.First().First().ToString() ), but this result is not as expected, as we wrote in the left plot area "42".

In fact, the explanation of this phenomenon has been embedded in our previous blog content. In the introductory section of the "Getting Started with handwriting recognition app," We gave a general introduction to the Mnist dataset used to train the model. In the end, the crux of the phenomenon is that, as the core of our AI application, the model itself does not have the ability to recognize multiple numbers-as the source of the model, that is, the mnist data set of the training data covers only a single handwritten number. Also, in the input Processing section of the application, we do not have extra processing of the handwriting graphics.

The combined result of these two points is that, in the case of writing down multiple numbers, we are actually "forcing" the AI model to make an inference beyond its adaptive range. This is part of the misuse of the AI model. The results are naturally difficult to satisfy.

So, to enhance the usability of the application, can we improve the application so that it can handle common mathematical expressions? This requires our application to recognize both numbers and symbols, but also to recognize multiple characters appearing at the same time: first, for multiple numbers, it is natural to think that since the Mnist model has been able to recognize a single number well, then we just need to separate multiple numbers, one by one to let the mnist model to identify it. For identifying other mathematical symbols, we can try to extend the recognition range of the Mnist model, or extend the mnist dataset. The combination of the two is a very workable solution. In this way, we have introduced two new sub-problems, namely "augmenting Mnist Datasets" and "splitting of multiple handwritten characters".

In conjunction with the above-mentioned issues and potential solutions, this article will be guided by the problem of "identifying and calculating simple mathematical expressions" to extend existing handwritten digital recognition applications.

Our goal is to overcome the existing limitations of identifying only individual numbers, allowing new applications to recognize numbers, subtraction, and parentheses as elements that can form simple mathematical expressions and to calculate the mathematical expressions that are identified. This article hopes that through these, can finally obtain a more realistic application of artificial intelligence.

The final application results are as follows:

Attention

"Identifying multiple characters that may appear" and "recognizing multiple characters appearing at the same time" is completely different, please note the difference.

Sub-problem: Extending the Mnist DataSet preparation data data format

In order for our new model to support characters other than numbers, it is a simple practice to extend the mnist dataset and try to reuse the existing model training algorithms (convolutional neural Networks). In the data preprocessing section of getting started with the handwriting recognition app, we have some insight into the data formats and specifications used by the Mnist dataset. In order to reuse existing resources as much as possible, it is necessary for us to keep the extended part of the data close to the original mnist data.

The mnist example used in the Samples-for-ai sample library downloads mnist datasets from http://yann.lecun.com/exdb/mnist/and serves as training data at the beginning of the run. After we run the mnist.py script and finish the training, we can samples-for-ai\examples\tensorflow\MNIST\input see four files in the directory with the extension, and .gz these four files are the mnist datasets downloaded from the Internet, the bitmap and the markup of the handwritten numbers. However, these files are compressed data, and the training programs we use will decompress the compressed files after the download is complete. The training program only stores the extracted data in memory and does not write back to the hard disk, so we cannot find the file in the input directory where the original bitmap data is stored.

Tips

We can still unzip it using a tool that supports this compression format. and use the Binary tool to view its contents.

From the http://yann.lecun.com/exdb/mnist/page, we can learn the file format of the bitmap file and the label file for the Mnist dataset. The most important is that the bitmap used for training is a 28x28 size, single-channel grayscale, the foreground color (stroke) corresponds to a value of 255 (by color is white), the background color corresponds to a value of 0 (color is black). From the previous blog we have learned that the mnist data set is to take the anti-save bit image, if it is displayed directly as a bitmap, and we see in the interface on the white background black word opposite.

Combining the description on the page and the mnist.py logic in the preprocessing section of the data, we understand that the final input data format used for the convolutional neural network requirements is as follows:

image Data Tag Data
Four-dimensional array One-dimensional arrays
The first dimension size is the total number of images entered;
The size of the second and third dimensions is the width of the input bitmap, which is 28;
The four-dimensional size is the number of color channels for the input bitmap, and mnist only uses grayscale images, so it is 1.
Size is the total number of pictures entered.
Each element (in the fourth dimension) is a 32-bit floating-point number. The value is greater than or equal to-0.5, less than or equal to 0.5. Where 0.5 represents the maximum value of the foreground pixel, and 0.5 represents the maximum value of the background pixel. Each element is a 64-bit integer. Take a value of 0-9, representing the corresponding handwritten 0-9 digits, respectively.

Based on the above input format, we have been able to determine the direction we are extending the training data. Here we need to note that these formats are the final input to the convolutional neural network data that must be met, not the new data we are about to collect and prepare. While this indicates that our newly collected data does not necessarily meet these conditions accurately, these input formats still play an important guiding role in our data collection.

Collect and Format data

Data is collected in a variety of ways. For the needs of this article, we can search for existing datasets on the web and develop small applications on our own to collect handwritten graphics on touch screens and even mobile phones, or to scan handwritten documents and extract operators from image segmentation. And, after collecting the raw data, we can scale, distort, add noise, and so on to expand and enhance our datasets for a wider range of adaptability.

After we have collected enough new images (considering that the original Mnist dataset has a total of 70000 images, we collect 40000 or so, although the number is not absolute), we also need to format it to facilitate our eventual use as a neural network input.

The processing required for the bitmap section is straightforward. We can refer to the previous handwritten digital recognition blog for the processing of handwritten graphics captured on the app's graphical interface, converting the collected images (which may have RGB channels) to 28x28 pixels, single-channel grayscale images, and the foreground color (that is, stroke) with a value of 0 (black). The background color value is 255 (white). Examples of the required bit drawings are as follows:

What's more important here is the processing of the picture markers. In the original Mnist dataset we saw that the integer 0-9 was used to mark the corresponding graph, which is a very natural practice. Because what we're dealing with here is a multi-classification problem, one of the prerequisites for solving this kind of problem is that we have to provide one by one tokens for each category. It's easy to think of graphics categories such as 10, 11, 12来 tagged plus, minus, multiplication sign, and so on. This is possible.

Here we cannot help thinking that the continuation of the already occupied natural number to mark the new category, although feasible, but to make the corresponding relationship becomes chaotic. Between 10 and the plus sign, 11, and minus sign, it's not as natural as a 0-9 integer and graph. As a developer, we can not help but think of the ASCII table subtraction character corresponding to the value of the number to mark it (if the plus sign corresponds to 53, minus 55)? This method of tagging is actually very difficult to use, especially if the Mnist training program that appears in this article is based on the TensorFlow framework, which requires that the integer value taken by the tag must be less than the total number of categories. With the guarantee mark and category one by one corresponding to the premise, we then have 0-9 marks, and then for our newly collected graphics categories to add tags. At this point we need to clearly define the corresponding relationship of the tag to the category so that we can correctly handle the input and output of the model.

We use 10-15 to represent the plus, minus, multiplication sign, division sign, parentheses, and parentheses respectively. And, for the sake of training, we require the bitmaps for these six mathematical symbols to be placed in each of the add、minus、mul、div、lp、rp six folders, and the six folders need to be in the same directory. As shown in the following:

Training model

To support our new six mathematical symbols, we need to modify the original Mnist Model training script (that is, used previously mnist.py ).

The python script required to train the model can be found here:

Github.com/ms-uap/edu/tree/master/ai301/self-built_mnist_extenstion

In this warehouse:

    • ./tensorflow_model/under the path, is the training script that supports the extended mnist DataSet mnist_extension.py . This script requires additional command-line arguments to --extension_dir specify the bitmap where we extend the six mathematical symbols;
    • ./extended_mnist_calculator/MNIST.Appunder the directory, this is the main code for this application. We will do so in the following.

In the above, we require that the newly collected data be formatted as a single-channel bitmap with a color value of 0 (black) as the foreground color and a color value of 255 (white). Our modified training script reads these bitmaps and their inverse color to reach the same effect as the original Mnist data (also consistent with the input processing part of our application).
Assuming we have a add、minus directory of six folders to store, D:\extension_images we can execute it from the command line in the directory of the cloned repository /training :

python mnist_extension.py --extension_dir D:\extension_images

To start training for an extended dataset that contains six mathematical symbols. After importing the raw mnist data, the training script also D:\extension_images reads six new categories of data from the directory separately. Then mix the old and new data for training. Possible training results such as:

Little Tips

Mixing old and new data is very useful here. Because during the training process, the current script is used only to use part of the data at a time for iterative optimization and model parameter updates. If you do not mix, the new data will be delayed the use of the situation, affecting the training results of the model.

Our training of the Mnist model is based on convolutional neural networks. And the script in the previous section does not modify the structure of the convolutional neural network used to train the original Mnist model in addition to the extended symbolic bitmap. We know that the structure of the system determines its function, so can the network structure we design for the original mnist data support the expanded data set? The simplest answer to this question is to do a training and observe the performance of the model.

After experimenting with this method, we pass the error rate (mainly validation error, which in this case reflects the error rate of the model currently on the entire validation set after every 100 small batches of training, and test error, In this example, it reflects the error rate of the model over the entire test set after the end of training, and it is good to discover the performance of the new model. Enough to support our next application.

Sub-problem: Split multiple hand characters

As mentioned above, in order to identify multiple simultaneous characters, we must also solve a sub-problem, which is to split the characters that appear at the same time.

We note that the application described in this article has a feature that is ultimately used as input graphics, the user is written on the spot, rather than through the picture file imported static picture. In other words, we have all the dynamic information in the process of stroke generation, such as the sequence of strokes, the overlapping of strokes, and so on. And we expect that these strokes are basically written horizontally. With this information in mind, we can design a basic segmentation rule: The Strokes that overlap the projections on the horizontal plane, we think they belong to a number.

The relationship between strokes and projections in the horizontal direction is as follows:

So when you write, you need to separate the numbers as much as possible. Of course, in order to handle casual overlap as much as possible, we can also set a threshold for overlapping portions relative to the position of each stroke, such as at least 10% of the stroke end.

After adding tolerance thresholds for overlap, the results of the segmentation of the strokes can be see. Strokes that are considered to be of the same character after splitting are drawn using the same color, and the strokes that do not belong to the same character are distinguished by different colors. Above the characters, we use a series of horizontal translucent color blocks to represent the overlapping relationship between the effective overlapping areas and the characters in the horizontal direction of each stroke.

By applying such rules, we can easily and efficiently segment multiple strokes, and use the batch inference functionality provided by Visual Studio Tools for AI to infer all the segmented graphs at once.

Application building and understanding complete application

Similar to "Getting Started with handwriting recognition," We're going to do this in this article by cloning the application code of the subject on GitHub and then referencing the model.

After getting the Git repository mentioned above, as described in the Training Model section, we can open the MnistDemo.sln solution in the directory through Visual Studio and ./extended_mnist_calculator add the AI Tools–inference model project to the solution as before. Slightly different from the previous blog, however, in order to differentiate our new model, we need to name the new model project Extendedmodel(also the default namespace name) and name the new model wrapper class MnistExtension . And this time, in the Model Project Creation Wizard, we need to select the new model that we trained above.

The new inference model project and model wrapper classes are configured such as:

Understanding code Input Processing

In the Code section of the new app, the biggest difference from the code we've covered in the handwritten digital recognition blog is how to handle the input. In the previous case, we simply adjusted the image format in the square area to be used as input to the Mnist model. In the case of this article, we must first split the strokes. After dividing the strokes we then convert each stroke combination into a single input required by the Mnist model.

The interface events that the new application needs to respond to, or the same as before: you need to respond to the mouse down, move, and lift three types of events. The changes we make to the response events that we press and move are relatively straightforward, and we just need to record the new strokes in these response times.

Record the process of producing strokes

First, we add a type of field for the form class List<Point> that records the points that the mouse has moved between each mouse press and lift, and the points are joined sequentially to form a stroke. We emptied all previously recorded mouse movement points in the mouse down event in order to record the new moving point of this writing, and to convert the points into the data structure of the strokes in the mouse lift event StrokeRecord (as defined below). Similarly, we have added a new type of field for the form class to List<StrokeRecord> record all the strokes that have been written.

private List<Point> strokePoints = new List<Point>();private List<StrokeRecord> allStrokes = new List<StrokeRecord>();

writeArea_MouseDownAdd the following statement to the method to empty the previously recorded mouse move point:

strokePoints.Clear();

And in the writeArea_MouseMove method, record the point at which the mouse moves:

strokePoints.Add(e.Location);

In the writeArea_MouseUp method, all the points generated between the mouse press and lift are converted into the data structure of the stroke. And because if the mouse does not move before lifting, it will not be a bit of record, before we can strokePoints.Any() first determine whether a bit is recorded. Here is the code for the conversion move point:

var thisStrokeRecord = new StrokeRecord(strokePoints);allStrokes.Add(thisStrokeRecord);

The Strokerecord structure, including constructors, is defined as follows:

<summary>///the data structure used to record the history of stroke information. </summary>class strokerecord{public Strokerecord (list<point> strokepoints) {//Copy all point to avoid        The list is modified externally.        Points = new list<point> (strokepoints); Horizontalstart = Points.min (pt + = pt.        X); Horizontalend = Points.max (pt + = pt.        X);        Horizontallength = Horizontalend-horizontalstart;        Overlaymaxstart = Horizontalstart + (int) (Horizontallength * (1-projectionoverlayratiothreshold));    Overlayminend = Horizontalstart + (int) (HORIZONTALLENGTH * projectionoverlayratiothreshold);    }///<summary>///The points that make up this stroke.    </summary> public list<point> Points {get;}    <summary>///The beginning of the stroke in the horizontal direction.    </summary> public int Horizontalstart {get;}    <summary>///The end point of this stroke in the horizontal direction.    </summary> public int horizontalend {get;}    <summary>///The length of the stroke in the horizontal direction. </summary> Public INT Horizontallength {get;}    <summary>///////The other stroke must pass through these threshold points before it is considered coincident with this stroke.    </summary> public int Overlaymaxstart {get;}    public int overlayminend {get;} private bool Checkposition (Strokerecord) {return (other). Horizontalstart < Overlaymaxstart) | | (Overlayminend < other.    Horizontalend);    }///<summary>///check if another stroke overlaps this stroke.        </summary>//<param name= "other" ></param> public bool Overlaywith (Strokerecord other) { return this. Checkposition (Other) | | Other.    Checkposition (this); }}
Split strokes

After adding the newly generated strokes to the list of all the strokes, we have all the strokes that the current user has written, and then we'll group the Strokes.

In this article, the implementation of the "fast" segmentation described above is very simple. We scan all strokes from the far left, after the strokes are small to large, in the horizontal direction of the left-most coordinates. If a stroke is not grouped, we will assign it a unique grouping number, and then see which strokes on the right side and the projection of the current stroke in the horizontal direction are valid coincident (as described above, there is a threshold of 10%), and the coincident strokes are classified as belonging to the same group. Until all strokes have been scanned.

allStrokes = allStrokes.OrderBy(s => s.HorizontalStart).ToList();int[] strokeGroupIds = new int[allStrokes.Count];int nextGroupId = 1;for (int i = 0; i < allStrokes.Count; i++){    // 为了避免水平方向太多笔画被连在一起,我们采取一种简单的办法:    // 当1、2笔画重叠时,我们就不会在检查笔画2和更右侧笔画是否重叠。    if (strokeGroupIds[i] != 0)    {        continue;    }    strokeGroupIds[i] = nextGroupId;    nextGroupId++;    var s1 = allStrokes[i];    for (int j = 1; i + j < allStrokes.Count; j++)    {        var s2 = allStrokes[i + j];        if (s2.HorizontalStart < s1.OverlayMaxStart) // 先判断临界条件(阈值10%)        {            if (strokeGroupIds[i + j] == 0)            {                if (s1.OverlayWith(s2)) // 在考虑阈值的条件下做完整地判断重合                {                    strokeGroupIds[i + j] = strokeGroupIds[i];                }            }        }        else        {            break;        }    }}

You can then group the Strokes by their corresponding grouping numbers:

List<IGrouping<int, StrokeRecord>> groups = allStrokes    .Zip(strokeGroupIds, Tuple.Create)    .GroupBy(tuple => tuple.Item2, tuple => tuple.Item1) // Item2是分组编号, Item1是StrokeRecord    .ToList();

Little Tips

To facilitate understanding of the segmentation effect of strokes, the application interface has a "show stroke grouping" switch. The strokes that are written after the tick are marked with a different color in the same group as above.

Generate a single graph for each grouping

When the partition is complete, we get an array groups , each of which is a grouping, including the grouping number and all the strokes within the group. Each of the groupings we get here corresponds to a character. If there are multiple strokes in the grouping, then these strokes are part of the character (imagine Plus and multiplication sign, both of which require two strokes to write). It can be thought that the order of the elements in this array groups is important, because we want to guarantee the order of the characters in the final recognition of the expression, so that the expression can be evaluated correctly.

We sequentially access each element in the loop groups . The named loop variable is group :

foreach (IGrouping<int, StrokeRecord> group in groups)

groupthe type of the loop variable is IGrouping<int, StrokeRecord> that it represents a grouping, including the number of the grouping (an integer) and the elements within it (the elements are all StrokeRecord ). The IGrouping<TKey, TElement> generic interface is also an iterative IEnumerable<TElement> generic interface, so we can group use variables directly as IEnumerable<StrokeRecord> objects of type.

Then we need to determine the area of the position of this grouping (that is, all of the strokes that are grouped together), where we care most about the leftmost and most right coordinate in the horizontal direction (the horizontal axis is left-to-right).

With these two coordinates we can determine the length of the projection of the group in the horizontal direction. We calculate this length in order to place the grouped graph in the middle of a single graph whenever we generate a single graph for each group. Although we will first create a large square bitmap (the edge length is the height of the plot area), the segmented shape no longer has a natural position on the square area. The following code makes the calculation of these positions, and the offset of the horizontal direction that is required to center the grouping:

var groupedStrokes = group.ToList(); // IGrouping<TKey, TElement>本质上也是一个可迭代的IEnumerable<TElement>// 确定整个分组的所有笔画的范围。int grpHorizontalStart = groupedStrokes.Min(s => s.HorizontalStart);int grpHorizontalEnd = groupedStrokes.Max(s => s.HorizontalEnd);int grpHorizontalLength = grpHorizontalEnd - grpHorizontalStart;int canvasEdgeLen = writeArea.Height;Bitmap canvas = new Bitmap(canvasEdgeLen, canvasEdgeLen);Graphics canvasGraphics = Graphics.FromImage(canvas);canvasGraphics.Clear(Color.White);// 因为我们提取了每个笔画,就不能把长方形的绘图区直接当做输入了。// 这里我们把宽度小于 writeArea.Height 的分组在 canvas 内居中。int halfOffsetX = Math.Max(canvasEdgeLen - grpHorizontalLength, 0) / 2;

We then draw the strokes in the current grouping on the newly created bitmap (drawn by the canvasGraphics object):

foreach (var stroke in groupedStrokes){    Point startPoint = stroke.Points[0];    foreach (var point in stroke.Points.Skip(1))    {        var from = startPoint;        var to = point;        // 因为每个分组都是在长方形的绘图区被记录的,所以在单一位图上,需要先减去相对于长方形绘图区的偏移量 grpHorizontalStart        from.X = from.X - grpHorizontalStart + halfOffsetX;        to.X = to.X - grpHorizontalStart + halfOffsetX;        canvasGraphics.DrawLine(penStyle, from, to);        startPoint = point;    }}
Batch Inference

In a new application, we need to recognize more than one character at a time. We used to recognize one character at a time, even though we called the model's Inference method () every time to recognize a character model.Infer(...) .

But we now have a set of data, which gives us the opportunity to leverage the parallel processing power of the underlying AI framework to speed up our reasoning process and eliminate the hassle of manually handling multithreading. Here we use the batch inference feature provided by Visual Studio Tools for AI to infer all data at once and get all the results.

First we need to create a dynamic array to store all the data before we create the bitmap for the resulting group:

var batchInferInput = new List<IEnumerable<float>>();

After each grouping is processed within the loop that handles all the groupings, we need to temporarily store the pixel data corresponding to that grouping in the dynamic array batchInferInput :

// 1. 将分割出的笔画图片缩小至 28 x 28,与训练数据格式一致。Bitmap clonedBmp = new Bitmap(canvas, ImageSize, ImageSize);var image = new List<float>(ImageSize * ImageSize);for (var x = 0; x < ImageSize; x++){    for (var y = 0; y < ImageSize; y++)    {        var color = clonedBmp.GetPixel(y, x);        image.Add((float)(0.5 - (color.R + color.G + color.B) / (3.0 * 255)));    }}// 将这一组笔画对应的矩阵保存下来,以备批量推理。batchInferInput.Add(image);

You can see that our handling of each grouping is identical to the previous processing of pixels in the entire square plot area. The only difference is that in the previous application code, an List<IEnumerable<float>> array of type (the variable in the above batchInferInput ) has only one element, which is the pixel data of a single bitmap. In this article, the array may have many elements, each of which is a set of bitmap data. After batch inference of such a set of bitmap data, the resulting result (that is, the inferResult variable) is an enumerable type, which we call "first-level enumeration". Each element of the first-level enumeration is also an enumerable type, which we call "second-level enumeration."

Each element in the first-level enumeration corresponds to the inference result of a set of bitmap data. The first-level enumeration is also the input array that corresponds to the batch inference, and the total number of results of the enumeration is the same as the input array length. For the second-level enumeration, because our inference result is just an integer, the second-level enumeration always has only one element. We can .First() remove it by taking it out. Here we can see that, in the previous application code, we took inferResult.First().First() out the only result, and here we need to consider the two-dimensional structure of the result of the batch inference.

The code for reasoning is as follows:

// 2. 进行批量推理//    batchInferInput 是一个列表,它的每个元素都是一次推量的输入。IEnumerable<IEnumerable<long>> inferResult = model.Infer(batchInferInput);//    推量的结果是一个可枚举对象,它的每个元素代表了批量推理中一次推理的结果。我们用 仅一次.First() 将它们的结果都取出来,并格式化。outputText.Text = string.Join("", inferResult.Select(singleResult => singleResult.First().ToString()));
evaluating expressions

At this point, our recognition of multiple handwritten characters is complete. We've got a string that can be easily processed by a computer program that represents the user's handwritten drawings. Next we begin to calculate the mathematical expressions that are written in the string.

In this paper, we need to calculate the format of the mathematical expression, which is relatively simple by the data preparation and the Model training section above. Only numbers 0-9, subtraction, and parentheses are involved. The evaluation of such an expression is a very typical problem. Because such mathematical expressions have very clear and definite grammatical rules, the most straightforward way to deal with them is to parse them first according to their syntax, and then evaluate them after constructing the syntax tree. Or, because the problem is classic, we can also look for existing components to solve the problem.

This article directly uses System.Data.DataTable the method provided by the class Compute to perform the calculation of the expression. This method fully supports the expression syntax that appears in this case.

Because the logical boundary of the expression is very clear, we introduce a separate method to get the final result:

string EvaluateAndFormatExpression(List<int> recognizedLabels)

EvaluateAndFormatExpressionMethod accepts a sequence of labels, where we are still using integer 10-15 to represent various mathematical symbols. In this method we do two mapping of the character tags, converting the label sequence to input to the calculator for evaluation, and for display on the user interface. The EvaluateAndFormatExpression return result of the method is shaped like "(3+2) ÷2=2.5". All the symbols are used in traditional mathematical notation. The implementation of this method is as follows:

  private string evaluateandformatexpression (list<int> recognizedlabels) {string[]    Operatorstoeval = {"+", "-", "*", "/", "(", ")"};    String[] Operatorstodisplay = {"+", "-", "X", "÷", "(", ")"}; String toeval = String. Join ("", recognizedlabels.select (label = + {if (0 <= label && label <= 9) {R Eturn label.        ToString ();    } return operatorstoeval[label-10];    })); var evalresult = new DataTable ().    Compute (toeval, NULL);    if (Evalresult is DBNull) {return "Error"; } else {string todisplay = string.                Join ("", recognizedlabels.select (label = + {if (0 <= label && label <= 9) { Return label.            ToString ();        } return operatorstodisplay[label-10];        }));    Return $ "{Todisplay}={evalresult}"; }}

It is also important to note that, depending on the expression evaluation scheme, we may need to adjust the characters in the expression to correspond to each other. For example, when we want to display division sign as a more readable "÷" in the user interface, the evaluation scheme we adopt might not support this division sign, but only support division sign in the C # language / . We also need to map the identified results appropriately before entering the identified results into the expression evaluator.

Frequently asked questions the new model is poorly identified for brackets and number 1

This is a very easy-to-happen situation. Because in handwriting, the positive and negative parentheses and the number 1 are very easy to confuse. This problem is sometimes reflected in the extended data. We observed the original Mnist data set (see Data visualization above), and many of the number 1 shapes and bends are similar to the brackets. If we do not make a clear distinction in the Extended Data section, and we use convolutional neural networks that are insensitive to such tiny data differences, it can lead to a case where characters with similar shapes are incorrectly identified.

Similarly, such problems can occur between the plus and multiplication sign. Because the plus sign and the multiplication sign shape are basically exactly the same, only by the angle to distinguish. If we collect the expanded data, the two symbols each have a certain angle of rotation, so that the angle is not clear enough to distinguish, which will also lead to the model of its recognition ability is not strong.

Scaling issues

After some expansion, our new applications already have some good features that initially meet the application needs of realistic specifications. In the case of this article, we can also get some enlightenment on how to integrate artificial intelligence and traditional technology to help us solve the problem better. Of course, this new application is still not strong enough and robust. In this respect, we note that there are some issues that remain to be resolved:

    • The algorithm of stroke segmentation is relatively simple, rough, how to improve the overall segmentation effect, in order to successfully handle overlapping, pen, noise and other possible situations?
    • As a calculator application, the new application described in this article has few features. How do you add new mathematical features, such as open-root, fractional, or more mathematical symbols?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.