Facebook has been working alongside other Internet companies to tackle the problem of inappropriate or harmful content, using shared technology that help to stamp-out extremist images and videos.
Now, the company has gone public with its artificial intelligence (AI), Rosetta which is able to understand contexts in images and videos through machine learning technology, in a bid to identify inappropriate or harmful content and keep the community safe.
Also, the understanding of text that appears on images is important for improving the user experience, such as a more relevant photo search or the incorporation of text into screen readers for better accessibility for the visually impaired.
According to Facebook, Rosetta was built and deployed as a large-scale machine learning system to extracts text from over a billion public Facebook and Instagram images and video daily and in real time, and puts it into a text recognition model that has been trained on classifiers to understand the context of the image altogether.
The AI system was necessitated after taking into account the sheer volume of photos shared each day on Facebook and Instagram, and the number of languages supported on the platforms; as the understanding of text in images is quite different from traditional optical character recognition (OCR), which recognizes character but not the context of the given image.
Facebook's Rosetta features: encoding, RPN, and classifiers, which are trained jointly in a supervised, end-to-end manner, with the text detection model running Faster R-CNN, albeit replacing the ResNet convolutional body with a ShuffleNet-based architecture for more efficiency.
Though, the naive approach of applying image-based text extraction to every single video frame isn't scalable, as the massive uploading of videos of different languages would lead to wasted computational resources.
Facebook aims to support a global language platform, by continuing to invest in extending the text recognition model for the wide number of languages used on the platform.
Facebook's AI, Rosetta to understand images and video context with machine learning
Facebook has been working alongside other Internet companies to tackle the problem of inappropriate or harmful content, using shared technology that help to stamp-out extremist images and videos.
Now, the company has gone public with its artificial intelligence (AI), Rosetta which is able to understand contexts in images and videos through machine learning technology, in a bid to identify inappropriate or harmful content and keep the community safe.
Also, the understanding of text that appears on images is important for improving the user experience, such as a more relevant photo search or the incorporation of text into screen readers for better accessibility for the visually impaired.
According to Facebook, Rosetta was built and deployed as a large-scale machine learning system to extracts text from over a billion public Facebook and Instagram images and video daily and in real time, and puts it into a text recognition model that has been trained on classifiers to understand the context of the image altogether.
The AI system was necessitated after taking into account the sheer volume of photos shared each day on Facebook and Instagram, and the number of languages supported on the platforms; as the understanding of text in images is quite different from traditional optical character recognition (OCR), which recognizes character but not the context of the given image.
Facebook's Rosetta features: encoding, RPN, and classifiers, which are trained jointly in a supervised, end-to-end manner, with the text detection model running Faster R-CNN, albeit replacing the ResNet convolutional body with a ShuffleNet-based architecture for more efficiency.
Though, the naive approach of applying image-based text extraction to every single video frame isn't scalable, as the massive uploading of videos of different languages would lead to wasted computational resources.
Facebook aims to support a global language platform, by continuing to invest in extending the text recognition model for the wide number of languages used on the platform.
No comments