We’re (data) hungry: the importance of Big Data for the subsea industry
Subsea industries, such as Marine Renewables or Oil & Gas, have been paving the way for safer and smart inspection operations underwater in order to improve efficiency while reducing their carbon footprint. If you’re part of this journey you may already be aware that this will require planned subsea operations to be performed by unmanned systems. You’ve probably heard about simulation, remote control, and intelligent vehicles. You’ve certainly been told you need data. Indeed, if you are taking part in this journey, I’m sure you feel data hungry today.
Well, this news isn’t exactly groundbreaking. Big players, such as Google and Facebook, provide us with their own knowledge, Artificial Intelligence (AI) services, and tools, but not their data. And here’s the reason why: in contrast to older machine learning algorithms, the performance of new AI approaches, such as deep learning, is strongly correlated with the amount of training data available. So needless to say, the holy grail for a highly-performant AI is labeled data.
Besides being hungry, I’m also feeling generous today. By that I mean I’d like to share with you some of the
labeled data key ideas we’ve been working on @Abyssal to acquire labeled data and make the most out of it for autonomous operations.
In an ideal scenario, we would get labeled data with no effort at all. But in reality, well, things get a bit more complicated. Crowdsourcing can lighten the burden of annotating hours of video, but it’s not free. Actually, in industries such as ours, it can become prohibitively expensive given the required expertise or data privacy issues. Thankfully, we have an easier (and free!) alternative: artificial data. Abyssal Simulator allows us to create virtual scenarios driven by real-world information. As we can simulate endless and diverse scenarios, we’re able to artificially generate unlimited training data. Among others, we can obtain video, RGB images, depth maps, and segmentation masks. Though a simulator is usually designed to train ROV pilots, ours can actually train deep models too.
Sometimes, we are willing to annotate some data, just as long as we do it in a smart way. One “smart” method that we can explore is active learning, using it to smartly select the most relevant samples to annotate whenever new data is made available. In practice, we reduce effort and save time in the annotation process by using a learnt model, that is, a model that was previously trained on the labeled data available, to pick samples, instead of using random sampling. As it learnt from previous labeled samples already, we can use this model to predict the labels of the remaining unlabeled samples. When the model is confident about which label to predict, we assume that the sample is an easy one, thus not relevant for an expert to annotate. Whenever the model finds a given sample ambiguous, we assume that the sample is a difficult one or, at least, that it’s significantly different from previous ones. Therefore, we pick it as a relevant sample to annotate next.
Transfer learning and domain adaptation
As soon as we get some labeled data, we aim to make the most of it so as to avoid the need to label more. The first obvious choice is to rely on transfer learning: to repurpose a model that was previously trained on one task to a second, related, task. Let’s say we need to detect fish in ROV videos, we can probably take some benefit from a model that detects corals. In actual fact, most models used for subsea related tasks could help. What about a model that detects fish in tanks? Handy, as well. Transfer learning, or more specifically a subtype called domain adaptation, can also aid us here as the method attempts to adapt a model which works in a given source domain (tank) and make it also work on a new target domain (subsea). And guess what? These techniques are amazingly convenient for turning our artificial models into real models.
Source Simulated Domain
Target Real Domain
Weakly- and semi-supervised models
Still aiming to make the most of our labeled data, we can seek to manipulate our deep learning model so that it uses the same labeled data as before, but learns more than would be expected. Confused? Well, I’m talking about weak supervision. Going back to our fish detection example, we can use weakly-supervised learning to teach our model to locate a fish within an image while only training it with binary labels, that is, whether there is a fish or not in the image.
Model inputs at training time
Model output at testing time
Along these lines, we can also explore semi-supervised methods and use the tons of unlabeled data we have to improve the performance of our models. For instance, unlabeled data can serve the purpose of training an autoencoder that may capture the most relevant properties of the data and help later when training with labeled samples. Generative Adversarial Networks are also promising for this purpose, as they can internally encode useful representations during their unsupervised training.
Wrapping up, there’s no doubt that labeled data is crucial for AI technology to succeed in the subsea domain. Only then can the subsea industry move one step forward, towards remote and autonomous operations. In this post I’ve covered some of the key solutions adopted by Abyssal to address this (data) hunger:
- artificial data
- active learning
- transfer learning
- domain adaptation
- weakly-supervised models
- semi-supervised models
Hopefully I’ve managed to pique your interest to the level that you’re no longer data hungry, but rather data starving. But don’t worry – over the coming weeks and months we’ll dive deeper into the nitty gritty of each of those solutions and how Abyssal approaches them with dedicated blog posts. You definitely don’t want to miss that.
Sea you next time,