We’re (data) hungry: the importance of Big Data for the subsea industry

Subsea industries, such as Marine Renewables or Oil & Gas, have been paving the way for safer and smart inspection operations underwater in order to improve efficiency while reducing their carbon footprint. If you’re part of this journey you may already be aware that this will require planned subsea operations to be performed by unmanned systems. You’ve probably heard about simulation, remote control, and intelligent vehicles. You’ve certainly been told you need data. Indeed, if you are taking part in this journey, I’m sure you feel data hungry today.

Well, this news isn’t exactly groundbreaking. Big players, such as Google and Facebook, provide us with their own knowledge, Artificial Intelligence (AI) services, and tools, but not their data. And here’s the reason why: in contrast to older machine learning algorithms, the performance of new AI approaches, such as deep learning, is strongly correlated with the amount of training data available. So needless to say, the holy grail for a highly-performant AI is labeled data.

Deep Learning VS Traditional machine performance

Besides being hungry, I’m also feeling generous today. By that I mean I’d like to share with you some of the labeled data key ideas we’ve been working on @Abyssal to acquire labeled data and make the most out of it for autonomous operations.




Artificial data

In an ideal scenario, we would get labeled data with no effort at all. But in reality, well, things get a bit more complicated. Crowdsourcing can lighten the burden of annotating hours of video, but it’s not free. Actually, in industries such as ours, it can become prohibitively expensive given the required expertise or data privacy issues. Thankfully, we have an easier (and free!) alternative: artificial data. Abyssal Simulator allows us to create virtual scenarios driven by real-world information. As we can simulate endless and diverse scenarios, we’re able to artificially generate unlimited training data. Among others, we can obtain video, RGB images, depth maps, and segmentation masks. Though a simulator is usually designed to train ROV pilots, ours can actually train deep models too.


Artificial training data Abyssal Simulator

Artificial training data obtained from a virtual underwater scenario generated by the Abyssal Simulator



Active learning

Sometimes, we are willing to annotate some data, just as long as we do it in a smart way. One “smart” method that we can explore is active learning, using it to smartly select the most relevant samples to annotate whenever new data is made available. In practice, we reduce effort and save time in the annotation process by using a learnt model, that is, a model that was previously trained on the labeled data available, to pick samples, instead of using random sampling. As it learnt from previous labeled samples already, we can use this model to predict the labels of the remaining unlabeled samples. When the model is confident about which label to predict, we assume that the sample is an easy one, thus not relevant for an expert to annotate. Whenever the model finds a given sample ambiguous, we assume that the sample is a difficult one or, at least, that it’s significantly different from previous ones. Therefore, we pick it as a relevant sample to annotate next.

Data annotation random sampling

Data annotation active learning

While random sampling (on top) gives experts random samples to annotate next, active learning uses a learnt model to smartly choose them




Transfer learning and domain adaptation

As soon as we get some labeled data, we aim to make the most of it so as to avoid the need to label more. The first obvious choice is to rely on transfer learning: to repurpose a model that was previously trained on one task to a second, related, task. Let’s say we need to detect fish in ROV videos, we can probably take some benefit from a model that detects corals. In actual fact, most models used for subsea related tasks could help. What about a model that detects fish in tanks? Handy, as well. Transfer learning, or more specifically a subtype called domain adaptation, can also aid us here as the method attempts to adapt a model which works in a given source domain (tank) and make it also work on a new target domain (subsea). And guess what? These techniques are amazingly convenient for turning our artificial models into real models.


Source Simulated Domain

Transfer learning and domain adaptation

Target Real Domain

Transfer learning and domain adaptation



Weakly- and semi-supervised models

Still aiming to make the most of our labeled data, we can seek to manipulate our deep learning model so that it uses the same labeled data as before, but learns more than would be expected. Confused? Well, I’m talking about weak supervision. Going back to our fish detection example, we can use weakly-supervised learning to teach our model to locate a fish within an image while only training it with binary labels, that is, whether there is a fish or not in the image.



Model inputs at training time

Weak supervision Model inputs at training time


Model output at testing time

Weak supervision Model outputs at training time


Along these lines, we can also explore semi-supervised methods and use the tons of unlabeled data we have to improve the performance of our models. For instance, unlabeled data can serve the purpose of training an autoencoder that may capture the most relevant properties of the data and help later when training with labeled samples. Generative Adversarial Networks are also promising for this purpose, as they can internally encode useful representations during their unsupervised training.




Wrapping up

Wrapping up, there’s no doubt that labeled data is crucial for AI technology to succeed in the subsea domain. Only then can the subsea industry move one step forward, towards remote and autonomous operations. In this post I’ve covered some of the key solutions adopted by Abyssal to address this (data) hunger:

    • artificial data
    • active learning
    • transfer learning
    • domain adaptation
    • weakly-supervised models
    • semi-supervised models



Hopefully I’ve managed to pique your interest to the level that you’re no longer data hungry, but rather data starving. But don’t worry – over the coming weeks and months we’ll dive deeper into the nitty gritty of each of those solutions and how Abyssal approaches them with dedicated blog posts. You definitely don’t want to miss that.


Sea you next time,

Filipa Castro
AI Researcher at

Filipa Castro holds a MsC in Biomedical Engineering at the Faculty of Engineering of University of Porto. During her studies, she worked as an intern researcher in Sheffield (UK) at the Center for Computational Imaging & Simulation Technologies in Biomedicine, working on computer fluid dynamics for predicting the rupture of aneurysms. As her interest in computer vision and artificial intelligence started to grow, she decided to do her master thesis at the Delft University of Technology, where she built a dataset and an action recognition system for sports. She is currently a computer vision and artificial intelligence researcher at Abyssal, working on image and video recognition with the ambitious goal of automating subsea operations.

You may also like