An analysis of IOS 11:core ml-

Source: Internet
Author: User

Apple introduced Nslinguistictagger in IOS 5 to analyze natural language. IOS 8 is out of Metal, providing the underlying access to the device's GPU. Last year, Apple added Basic neural network subroutines (Bnns) to the accelerate framework, allowing developers to build neural networks for reasoning (not training).

This year, Apple has given us Core ML and Vision, allowing iOS developers to step up the ladder on AI.

    • Core ML makes it easier to use trained models in the APP.
    • Vision gives us easy access to Apple's models for face detection, facial feature points, text, rectangles, barcodes, and objects.

You can also package any image analysis Core ML model in the Vision model. Because these two frameworks are built on Metal, they can run efficiently on the device, so there is no need to send the user's data to the server.

First, what is CORE ml?

Believe that many people have heard of machine learning, in addition to professionals, there should be few people to study the specific implementation of machine learning, the emergence of CORE ml, greatly reducing the entry threshold for iOS developers to enter this field, to develop more intelligent products at the lowest cost.

An important part of machine learning is to use massive amounts of data to train specific models, and then to accurately predict results when new data is encountered. For example, a model is trained in advance by a large number of object features, and when a new object is entered into the model, the model can accurately predict the species and categories to which the object belongs; After learning a lot of go games, in the face of a strange chess game, know where to win the probability of playing chess higher.

When the machine is trained, the training is completed, a data model is generated about the particular problem, and the model enters new data about the particular problem, and the models return a prediction result. What the Core ml is actually doing is using a pre-trained model (trainedmodels) to make predictions in native using related modules such as Mlneuralnetworkengine, eventually returning the results, This local prediction can be done in a way that does not depend on the network or reduces processing time.

The application and core ml interaction processes are broadly:

As can be seen from the figure, the real intelligence is actually the pre-trained model (trainedmodel), which determines the outcome of the final judgment. Apple offers a number of well-transformed core ML format models, or it can be used by Apple to convert the models generated by other common machine learning tools into the core ML format model, the tool current page just supports the conversion of some common format models, if you need to convert to compare specific format model, You need to refer to the code of this tool to convert your specific model data into Apple's defined model data.

Apple's Core ML framework now supports feedforward neural networks, convolutional neural networks, recursive neural networks, decision tree integration such as random forest and lift trees, support vector machines, linear regression and logistic regression, feature engineering, and pipelining models.

Ii. related technologies involved in CORE ML

Core ML is an essential framework for machine learning, and Vision and Gameplaykit have the appropriate processing using core ml. To improve computing performance, Apple takes advantage of the hardware features to maximize the performance of core ml, reducing memory consumption and power consumption.

1, Metal

Metal is a highly optimized framework for GPU programming in the IPhone and IPad, and the biggest benefit of Metal compared to OpenGL ES is a significant reduction in consumption. OpenGL creates a copy of the buffers and textures in order to avoid the GPU from using the data when the exception occurs, and these replication operations are very time-consuming. In order to improve efficiency and performance Metal the latter in terms of security and efficiency, Metal does not replicate resources, and using Metal programming requires the developer to keep the data secure, and developers need to be responsible for synchronizing access between the CPU and the GPU. There are still some problems in this area that need attention when using Metal.

Another benefit of Metal is that it predicts GPU status to avoid redundant validation and compilation. In OpenGL, you need to set the state of the GPU in turn, and you need to verify the new state before each draw command (draw call). In the worst case, OpenGL needs to recompile the shader (shader) again to reflect the new state. Metal has chosen another method, during the rendering engine initialization process, a set of States are baked (bake) into the path of the estimated render (pass). The render Path object can be used in conjunction with several different resources, but the other states are constant. A render path in the Metal eliminates the need for further validation, minimizing API consumption and thus greatly increasing the number of drawing instructions per frame.

2. Neural network

Depth learning (deep learning, DL) is currently quite hot, any direction and application, all hope to apply to deep learning, put on the smart crown, not only the Internet, artificial intelligence, life in all the major areas can reflect the profound learning led to the great change. To learn deep learning, we first need to clarify the artificial neural network (Artificial neural Networks, referred to as Ann), the design inspiration of artificial neural network is entirely derived from the biological neuron of the information transmission mechanism, neural network has developed into a class of multidisciplinary interdisciplinary field, It has also been valued and respected along with the progress made in deep learning.

convolutional Neural Network (convolutional neural Networks, abbreviated as CNNs or convnets) is a kind of artificial neural network, which has become the research hotspot in speech analysis and image recognition field. Its weight-sharing network structure makes it more similar to the biological neural network, reduces the complexity of the network model and reduces the number of weights. The advantages are more obvious when the input of the network is multidimensional image, so that the image can be used as the input of the network, which avoids the complex feature extraction and data reconstruction process of the traditional recognition algorithm. CNNs is the main force in the field of deep neural networks. They have learned to classify images, and the recognition accuracy of images has surpassed that of humans.

3, Metal performance Shaders

Metal performance shader is Apple's set of tools for deep learning on iOS through Metal, which mainly encapsulates mpsimage to store data management memory, enabling convolution, Pooling, Fullconnetcion, Relu and other commonly used convolutional neural network layer.

If your learning model core ML is not supported, or if you want to fully control the input and output of each layer, you must use MPs to do so, and the model parameters that are trained on the server need to be converted to be used by MPs. The general CNN network contains the training parameters of the layer is basically only convolution, fullconnetcion, normalization of the three layer. In other words, the three-layer parameters can be converted to the format MPs need to use the MPs.

III. application Scenarios for Core ml+vision

Core ML provides an underlying library of algorithms that currently support algorithms such as neural networks, tree combinations, support vector machines, generalized linear models, feature engineering, and pipelining models, in theory, as long as we are trained as models based on these algorithmic architectures, the core ML can be supported.

You may have guessed from its name that vision allows you to perform computer vision tasks. You might have used OpenCV in the past, but now IOS has its own API. Vision Library provides many functions of image processing, such as face recognition, feature detection, barcode recognition, character recognition, and classification of scenes in images and videos, and Apple's operation on these large data volumes is optimized in depth and performance is better.

There are several tasks that Vision can perform:

    1. Look for a face in a given image.

    2. Look for detailed features of the face, such as the position of the eyes and mouth, the shape of the head, and so on.

    3. Track the objects moving in the video, and determine the angle of the horizon.

    4. Convert two images to align their contents and detect areas in an image that contains text.

    5. Detect and identify barcodes.

you can use vision to drive core ML, and you can use the Vision framework to pre-process some data when using core ml for machine learning. For example, you can use Vision to detect the position and size of a face, crop a video frame to that area, and then run a neural network on that part of the face image.

When using core ML for machine learning, the input image data requirements are the format and size specified by the model, and most of the data we obtain is not satisfied with this requirement, if the Vision frame is used to adjust the image size, image chroma, etc. It is easy to convert the image data into the format required by the model.

With these capabilities that Apple offers, combined with our own products, you should be able to create a lot of interesting product features, and imagine a large space, such as:

    1. With a well-trained model, low-resolution images can be converted to high-resolution images, which saves a lot of traffic and can be greatly improved in the user experience.

    2. Used for face detection, focusing on the face of a person before interacting with this friend

    3. By learning the user's operation path in the app, predict the user's behavior, for example, by predicting the user's habits, timing to send the user feed?

In short, there are many scenarios that can be applied.

Iv. using core ml in image recognition practice

Requires Xcode 9 Beta1 or later, as well as an IOS 11 environment, to download the demo

The project allows users to select a picture from the photo gallery and select the object classification recognition and the rectangle area digital recognition.

1, directly using ML for image classification and recognition

a. Integrate the Core ML model into your APP

For example, the INCEPTIONV3 model can be downloaded from Apple's "machine learning" page. The models that Apple offers today are used to detect objects in images-trees, animals, people, and so on. If you have a trained model and are trained using a supported machine learning tool, such as Caffe, Keras, or scikit-learn,converting trained Models to Core ML, describes how to convert it to the core ML format.

After downloading the Inceptionv3.mlmodel , drag it from the Finder to the Project Navigator:

b, Xcode compiles the model into three classes: Inceptionv3input, Inceptionv3output, Inceptionv3, Also generate a INCEPTIONV3.MLMODELC file directory, which is the need to use the trained model data.

Just add a few lines of code to implement a simple object classification smart recognizer. Greatly reduced the threshold of artificial intelligence

- (nsstring*) PredictionWithResnet50: (cvpixelbufferref) buffer{Nserror *modelloaderror =NilNsurl *modelurl = [Nsurl urlwithstring:[[NSBundle Mainbundle] pathforresource:@"Resnet50" oftype:@ "MLMODELC"]; resnet50* resnet50 = [[Resnet50 alloc] Initwithcontentsofurl:modelurl error:&modelloaderror]; nserror *predictionerror = NIL; Resnet50output *resnet50output = [Resnet50 predictionfromimage:buffer error:&predictionerror]; if (predictionerror) {return predictionerror.description; } else {//resnet50output.classlabelprobs sort return [nsstring stringwithformat:@" recognition Result:%@, Match rate:%.2f ", Resnet50output.classlabel, [[Resnet50output. Classlabelprobs valueforkey:resnet50output.classlabel]floatvalue]; }}
2, using vision to identify the number in the rectangle box

The above approach is to predict results directly using the core ML operation model, and in addition to this, we can package any image analysis Core ML model in the Vision model. Model additions are consistent with the above approach, and we only need to encapsulate the relevant request through vision,

- (void) Predictministclassifier: (uiimage*) UIImage {ciimage *ciimage = [Ciimage imagewithcgimage:uiimage.    Cgimage]; cgimagepropertyorientation orientation = [Self cgimagepropertyorientation:uiimage];Self. inputimage = [Ciimage imagebyapplyingorientation:orientation]; vndetectrectanglesrequest* rectanglesrequest = [[Vndetectrectanglesrequest alloc]initwithcompletionhandler:^ ( Vnrequest * _nonnull request,Nserror * _nullable error) {[Self handlerectangles:request error:error]; }]; Vnimagerequesthandler *handler = [[Vnimagerequesthandler alloc] Initwithcgimage:uiimage. Cgimage orientation:orientation Options:NIL];Dispatch_async (Dispatch_get_global_queue (0,0), ^{nserror* error =Nil [Handler Performrequests:@[rectanglesrequest] error:&error]; });} - (void) Handlerectangles: (vnrequest*) Request error: (nserror*) Error {vnrectangleobservation *detectedrectangle = Request. Results. Firstobject;Cgsize imageSize =Self. inputimage. extent. size;CGRect BoundingBox = [Self Scaledcgrect:detectedrectangle. BoundingBox Tosize:imagesize];if (! Cgrectcontainsrect (Self. inputimage. extent, BoundingBox)) {NSLog (@"Invalid detected rectangle");Return }Cgpoint TopLeft = [Self Scaledcgpoint:detectedrectangle. TopLeft Tosize:imagesize];Cgpoint TopRight = [Self Scaledcgpoint:detectedrectangle. TopRight Tosize:imagesize];Cgpoint Bottomleft =[Self Scaledcgpoint:detectedrectangle. Bottomleft Tosize:imagesize];Cgpoint bottomright = [Self Scaledcgpoint:detectedrectangle. BottomRight Tosize:imagesize]; Ciimage *cropimage = [Self. Inputimage Imagebycroppingtorect:boundingbox];Nsdictionary *param = [Nsdictionary Dictionarywithobjectsandkeys:[civector vectorwithcgpoint:topleft],@"Inputtopleft", [Civector vectorwithcgpoint:topright],@"Inputtopright", [Civector vectorwithcgpoint:bottomleft],@"Inputbottomleft", [Civector vectorwithcgpoint:bottomright],@"Inputbottomright",NIL]; ciimage* filterimage = [Cropimage imagebyapplyingfilter:@"Ciperspectivecorrection" Withinputparameters:param]; Filterimage = [Filterimage imagebyapplyingfilter:@"Cicolorcontrols" withinputparameters:[Nsdictionary dictionarywithobjectsandkeys:@ (0), kciinputsaturationkey,@ (), Kciinputcontrastkey,Nil]]; Filterimage = [Filterimage imagebyapplyingfilter:@"Cicolorinvert" Withinputparameters:NIL];UIImage *correctedimage = [UIImage Imagewithciimage:filterimage];Dispatch_async (Dispatch_get_main_queue (), ^{Self. ImageView. image = Correctedimage; }); Vnimagerequesthandler *vnimagerequesthandler = [[Vnimagerequesthandler alloc] initwithciimage:filterimage options:NIL]; Mnistclassifier *model = [Mnistclassifier new]; Vncoremlmodel *vncoremodel = [Vncoremlmodel Modelformlmodel:model. Model Error:NIL]; Vncoremlrequest *classificationrequest = [[Vncoremlrequest alloc] Initwithmodel:vncoremodel completionHandler:^ ( Vnrequest * _nonnull request,Nserror * _nullable error) {Vnclassificationobservation *best = Request. Results. firstobject; nsstring* result = [nsstring stringwithformat:@"Recognition results:%@, Match rate:%.2f", Bestidentifier,best. Confidence]; Dispatch_async (Dispatch_get_main_queue (), ^{ selfresultlabel. Text = result;}); nserror *imageerror = nil; [Vnimagerequesthandler Performrequests:@[classificationrequest] error:&imageerror];}     
3. Operation effect

V. Some thoughts 1, whether the model can be downloaded by the way

From the several models Apple provides, they occupy a few 10 trillion of space, in practical applications, this is basically unrealistic, the installation package increased by dozens of trillion, the basic is unthinkable. After analysis, Xcode did two things for the added model: Create the corresponding class, add the model data file, and we can do it ourselves.

a, first we generate the required classes ourselves, refer to the project GoogLeNetPlaces.h googlenetplaces.m two files

b, to add the required model folder GOOGLENETPLACES.MLMODELC as a resource to the project, the actual can be downloaded to get

c, when the Googlenetplaces instance is generated, the path of the model file is passed in

- (nullable instancetype)initWithContentsOfURL:(NSURL *)url error:(NSError * _Nullable * _Nullable)error {    self = [super init];    if (!self) { return nil; } _model = [MLModel modelWithContentsOfURL:url error:error]; if (_model == nil) { return nil; } return self;}

Mlmodel is created by means of an interface

+ (nullable instancetype)modelWithContentsOfURL:(NSURL *)url                                          error:(NSError **)error;

Done, I we just need to specify the path to the model data. In this way we do not need to add the Places205-googlenet model to the project, that is, it can be used to complete object prediction, in the same way, other models may also be done in this way.

2, if you need to modify the online app model data, this need to complete it?

And that's what you can do,

a, if you just change the parameters and data of the model, the interface does not change (that is, the class files do not change), this can be done by changing the model data, to modify the external network model requirements

b, if the model interface changes, or want to change a model to predict, this situation, if the ability to borrow OCS can also be done, the generated class code into the OCS script issued, interface and model files, all by downloading new plugins, The need to modify model parameters and even replace other models for data prediction

3. is thread safe?

Now from the document, do not explicitly say whether thread safety, their own experiment sampling 100 threads running in parallel, no exception, the specific need to wait until the release of the official version, and then see if thread safety

Vi. Some of the problems encountered
    1. Now look at the model prediction accuracy is still relatively low, many kinds of circumstances can not be recognized, I hope the official version will improve the accuracy rate

    2. Xcode9 Beta version does not support the addition of resources directory, if you want to add resources in the project, you must first open the project in Xcode8, add in, and then use Xcode9 Beta open, this should be Xcode9 Beata version of the bug, the official version should be able to repair

    3. Xcode9 Beta, the simulator that led to Xcode8 is not enough.

    4. Training is not possible on the device. You need to use the Offline Toolkit to train and then convert them to the Core ML format

    5. If the Core ML does not support all layers. At this point, you cannot use your own kernel to extend the Core ML. The Mlmodel file format may be less flexible when you use tools such as tensorflow to build a common computing diagram model.

    6. The Core ML conversion tool supports only a limited number of training tools for a specific version . For example, if you train a model in TensorFlow, you cannot use this tool, and you must write your own conversion scripts.

    7. Cannot see the result output of the core ml middle layer, can only get the prediction of the last layer of network, when using the model to predict the problem, this time is not good location is the problem of the model or the framework.

    8. If you want to be able to fully control the various layer outputs of machine learning and decide whether to run on the CPU or GPU, you must use the Metal performance Shader or accelerate framework to accomplish your model's operation.

In summary, there are many issues with the current version of Core ml, and it is expected that the release will address these issues

An analysis of IOS 11:core ml-

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.