At the 2021 Workshop “The Pandemic, Technology and Language Revitalization,” linguist Claire Bowern (Yale University) gave a presentation on Technological Issues in Remote Revitalization and Documentation surveyed the relative merits of a range of hardware and software tools for language documentation. Below are the video and transcript of her presentation.
Claire Bowern: Hi, Nicholas. How are you?
Nicholas Welch: Hi, Claire. I’m well. How are you? It’s good to meet you.
Claire Bowern: And you, yeah. I guess we’ve corresponded over email, but yeah, we haven’t actually seen each other in person.
Nicholas Welch: That’s right, that’s right. Yeah. All right. So… Okay. I think people are back now. All right, welcome back, everybody, for our second talk of the day. We have Claire Bowern, who is a professor in the Department of Linguistics at Yale University. She’s worked for many years with communities and speakers of Indigenous Australian languages on documentation and revitalization projects, and she’s widely recognized for her research on community-centered linguistics and computational methods in documentation. She’s the author of the foundational textbook Linguistic Fieldwork: A Practical Guide, as well as dozens of papers on language change, on documentary fieldwork, especially both the ethical and methodological issues involved with fieldwork and on language revitalization. She’s one of the huge names in language documentation and revitalization, and I’m very glad that she can be with us today.
Claire Bowern: Well, thank you, Nicholas. I don’t know about that. That’s an introduction that I’m sure I won’t be able to live up to. I cannot share my screen. I have a horrible feeling I may need to log in and log out again, but while I do that… Because the share button is grayed out, while I do that, I’m just going to share in the chat the paper that this talk is based on, with the number of co-authors. And so yeah, so hopefully this will not take very long, but I got a message that I need to restart the connection in order to share.
Nicholas Welch: All right. Thank you very much for sharing the paper.
Claire Bowern: Okay, it looks like that worked, and right, and I have video back, and I can now share my screen. Excellent. Yeah, I apologize for that. Okay. Hopefully, everyone can now see… I cannot see you, but hopefully you can all… Oh, there we go. So can everyone hear me okay?
Nicholas Welch: Sound is good.
Claire Bowern: Okay, great, fantastic. Okay, in that case I will get started. So, I’m talking here from Quinnipiac land in New Haven, Connecticut, and thank you very much for inviting me to talk about technology and recording issues in remote revitalization and documentation. It’s very much an honour to be working with you and learning about what the other panelists are working on as well, and hopefully this talk will mesh in with that as well. So, I’m mostly an Australianist. I’m from Australia, originally. I work on Australian languages, and I do language documentation and work on language change and things like that. I work particularly with Bardi, the Bardi community in the far northwest of Australia, and also with Yan-nhaŋu and Yolŋu people in the northwest and also Kullilli people on language reclamation in western Queensland as well, and that work involves a number of different types of language documentation, language revitalization, language reclamation work, so for instance, Yolŋu is a language or a group of languages with a fairly large number of speakers, by Australian standards. Bardi is a language where we’re working with elders and honoring the knowledge of those elders, but also preparing for a time when most of the members of the Bardi community will not be fluent speakers of the language. And for Kullilli, the situation is quite different, again, where the last fluent speakers of the language worked with linguists in the 1940s and 1950s, and so we’re now working with those recordings and those, the written materials and so on, to produce materials for contemporary Kullilli people to bring the language back across a large area.
Today, however, what I wanted to do is talk on joint work with a number of my colleagues at Yale on differences in different types of recording technology, so different devices like cell phones, solid state recorders, things like that, and also different types of remote recording software like Zoom, and Skype, and Webex and so on. This is very much joint work with student and faculty colleagues here, and so I’m presenting on behalf of all of us. I put the link to the paper in the chat, and I’m very happy to take comments on that as well. The paper itself is currently under review. And so what we were thinking of was trying to figure out how much the recording technology matters for remote recording and doing fieldwork and language reclamation at a distance, so how does the remote recording alter what’s recorded? Now, we know, of course, that not all digital recording is identical. There’s… We think about the differences between digital recording and analog recording, but of course there are many different types of digital recorder and many different ways in which things happen to the digital signal once that signal is recorded. So, for instance, we have solid state recorders like this one, and like the, you know, the Zooms and the Edirols and so on, we have phones, we have tablets, we have recordings on our computers, and we can use many different types of videoconferencing software. So, for instance, Zoom and Skype are the ones that I’m most most familiar with, as you saw when I wasn’t quite sure how to do stuff with Webex and didn’t realize I need to log in and out and give permissions and so on. But we also have things like WhatsApp for sound sharing and even TikTok for and things for sharing videos and things like that as well. And while digital recording as a general thing is pretty familiar to people working on language revitalization, reclamation, and language documentation, the videoconferencing recordings haven’t been as widespread until this year, although it was great to see the work that Marie-Odile has been doing long-term with these sorts of things. However, for the most part, I’d say we’ve been thinking about workshops and in-person recordings as being the main way we do recording, but with the pandemic and being unable to travel and not wanting to spread COVID further and so on, we’ve… A lot of fieldworkers and a lot of documentation programs have suddenly moved online, and we’ve moved to working with recording software without necessarily testing how those recordings are different from the solid state recorders that we’re more familiar with. And so that was the aim of this paper.
Okay, so let me just briefly talk a little bit about how digital recording works at a very, like at a very non-technical level, because, you know, I’m not a sound engineer and I’m not a phonetician, and just to think about the different ways in which digital recordings might differ from each other. So when we make a digital recording, you know, as I’m talking now and I’m being digitally recorded, I’m making sound waves through the air. Those are going to a microphone, which then has technology which encodes the sound patterns and the different pressures from the, pressure from the air in binary data, digital data, so ultimately, sets of ones and zeros. And that recording or that digital imprint allows us to read the recording with other software to recreate the sound, basically, so that, you know, that’s what we’re doing. We’re recording the sounds that we’re making and then we’re playing them back and, you know, cutting them up and doing things to files and so on. But to do that with high fidelity, to do it very close to the, to make a recording that matches the original [a lot 10:25], that takes a lot of storage on the computer. Right? That takes a lot of… There’s a lot of information in a sound that our brains are incredibly good at decoding, but to represent that digitally, we need a fair amount of space, and so making large amounts of recordings takes up a lot of space, and if we want to transmit those recordings over the internet, then we also need to think about bandwidth issues and things like that as well. So most digital recording, particularly that involving the internet, involves some sort of compression, so that’s taking predictable bits of the signal or taking bits of the signal that we might not need to understand speech, and compressing that together, basically either deleting it recoverably or non-recoverably. So that’s one way in which digital sounds can differ quite a lot. Some types of recordings like our solid state recorders don’t use compression, or they use what’s called lossless compression, so compression where we can always recover what has been compressed, whereas Zoom and Skype and Facebook Messenger and so on mostly use lossy compression, and they use proprietary compression algorithms. That is, the companies develop them themselves, and we don’t have access to that, so we don’t know exactly what they’re doing to the signal. There are also things like filters that [much 11:50] software users and actually microphones themselves sometimes use, to emphasize the speech portions of the signal and de-emphasize things like background noise, car horns, parts of the speech signal that we don’t want to or don’t need to transmit in over the internet. And so that saves on bandwidth and it makes it easier to hear the, and decode the speech for humans, but it also affects the sound signal in ways that might be relevant if we’re making recordings for long-term language documentation. And finally, we have things like sampling, which I think are probably better understood because they are things that we need to consider with things like solid state recorders as well. So sampling is basically how much information is recorded in the first place from the signal. And so we can record at very high sampling rates and get a very detailed accurate picture at the expense of more storage space, and so on too.
Okay, so our aims in this project were to test what differences the recording technology introduces, both the devices, so differences between cell phones and solid state recorders, and computers and so on, and also between different types of software, Zoom and Skype and so on. We wanted to develop suggestions for remote recordings so that languages are recorded in the best way possible because these recordings, we’re going to be using them for a long time, they’re important to communities, and I think it’s important to do right by the elders who we’re working with to make sure that we do our part to make the best possible recordings too. So our methods were to look at, do these recordings in two phases. We wanted to make sure that what we were comparing with the devices and with the software was as close to identical as possible in all conditions, and so to make it as close to identical as possible, we tried to record at the same time and use the same audio files as much as possible. So we could have said the same words over and over again with different recording devices, but even that would introduce differences. So, for instance, in one of our recordings, a fire engine went past the street, so we have different amounts of background noise. In another case, the air conditioning system turned off in the middle of the recording. So even things like that are going to make things sound a little different, and we wanted to make sure that everything we recorded was as identical as possible. And so what we did was, in the first phase, we had a person, me and two of my colleagues, in front of a computer, actually in front of two computers, two phones, an iPad and a solid state recorder, and we recorded simultaneously to all those devices. So we got exactly the same speech event in the recording. And then in the second phase, what we did was, we took the signal from the solid state recorder like this one, and played it into the computer as though the solid state recorder were a microphone. And so that way, we could make sure that we were transmitting exactly the same digital recording through Skype, through Zoom, through Facebook Messenger, and so on, and so that allowed us to make as directly comparable as possible sets of recordings.
So here’s what it looked like. So we had someone sitting in the chair there. We had the solid state recorder right in front of them, and then we had a couple of computers with an internal and an external microphone, a couple of phones. We had an iPhone and an Android phone and an iPad, and so that was our device setup. So we didn’t test every potential device. There are many, many different types of phones and so on, but this gave us a range of items.
And then for the remote software, we decided to pick four pretty commonly used, or actually three commonly used types of recording software, and then compare that with a podcasting software, with podcasting software and Audacity, which records directly to the computer. So we compared here with different types of configurations of Zoom, Skype, and Facebook Messenger, and then Cleanfeed is this podcasting software.
In terms of methods, we recorded three speakers of English saying sentences with different words in them. So, you know, we say “bat” again, we say “cup” again, we say “microphone” again, etc. We had 40 — sorry, 94 — different words in the placeholder, testing common contrasts, vowel contrasts, stressed and unstressed vowels, different types of fricatives and things like that. We had to work on English… Well, we wanted to work on languages that we were native speakers of, and due to the university’s restrictions on guests, we weren’t allowed to invite visitors to campus. Many of our students are participating remotely in classes, and so the three people who we could easily get to campus at the same time under social distancing protocols and so on, I think at that point, it may have been that only faculty were able to come to campus, even. And so that is why we did these recordings. I should say that I would expect different… Well, I would expect special issues to arise with other types of consonants in particular, so in doing this test not in a pandemic, we would want to test, for instance, different types of glottal contrasts or different types of consonant clusters. We did some testing of nasality, retroflexion and so on.
So we made these recordings. Then we aligned them at the segment level so we could pick out the duration of the vowels, individual vowels and consonants, and make those measurements, and then we tested common sound characteristics using Praat, so we looked at how the vowel space measurements differed when we had different types of recording, or measured the duration of individual vowels and consonants and see how they differed. And so these are all things that phoneticians care a lot about, but they’re also good measures for thinking about how the perception of the sound that’s recorded might change over different types of recordings, so how different speakers might sound different under different recording conditions. We also looked at the harmonic-to-noise ratio, the signal-to-noise ratio, as well, which is basically a measure of background noise, and so that would give us a measure of how clear the recordings are. Okay, and these were all analyzed statistically, but I’m not going to talk about that today.
So for our results, I’m going to summarize the results and then talk in a little bit more detail about some specific things. So overall, we did find differences. We found differences both among the devices and among the software programs. The software programs differed more, and more importantly, more substantially, than the devices. So one of the take-homes from this is that the difference between an iPhone and a solid state recorder like a handy Zoom H4, for example, is less important than the difference between a solid state recorder and recording over Zoom or recording over Facebook Messenger. Okay? So the differences between devices when recording in person was less, and also less significant for the results than the in-person versus remote recording.
We found substantial differences in levels of background noise and levels of filtering, even when the recording devices were otherwise pretty similarly set up. So if… You saw with our setup there the, we had an internal microphone for the computer which was set up pretty much like I’m talking now, so it was, what, about a foot, a little bit less than a foot away. We had an external microphone, which is a headset microphone, and then we had other devices close to the speaker but not directly in front of them. It turned out that that made quite a difference to the signal-to-noise ratio, but not in a way that we might expect necessarily. So, for instance, if I remember right, the iPhone was pretty good at [shielded 20:28] with background noise, so it made pretty good recordings even though the iPhone was further away than the computer with the external microphone. The solid state recorder had a more sensitive microphone, and so it picked up more of the speech, but it also picked up more of the background noise. Okay, so these are sorts of things that differ amongst recording devices, and I guess we’re already used to taking these things into consideration, but making a recording with all of these things simultaneously emphasized just quite how different these different devices can be. These sorts of background noise measurements affect the clarity of the recording, so it can make transcription harder if you’re working from the transcriptions — working from the recordings — later on, it also affects measurements if making comparisons, and it also affects things in more or less systematic ways, put it that way. So different recording devices can do different things to the signal, such that if you’re recording one person with one type of device and another person with another type of device, you may get things that end up being differences between the speakers which are not due to the speaker differences, they’re due to the device differences, and so sorry if that was that was kind of confusing. But the, I guess, the overall take home from that is, try to be as consistent as possible with devices across different speakers, and also with the different types of software as well. So having said all of that, most of the meaningful contrast, most of the things that really matter for understanding speech, were recovered under all of our conditions. They were recovered in different ways, but they were recovered. And so to see this, I want to show you some examples with vowel spaces and some examples with stressed and unstressed vowels in English, so these are two sets of recordings, the… So on the left here we have the… These are the vowel durations, so how long the vowels are in stressed and unstressed signals. So this is like if we have a word like… What’s a good example? Like “polish,” so the “po” in “polish” is going to be stressed, the “ish” in “polish” is unstressed, and so we’re measuring how long the vowels are in each of those types of syllables, types of vowels. And so on the left we have the different types of recording devices, the Zoom H4n, the solid state recorder, a couple of cell phones, iPad, and so on, and then on the right, we have the different types of software as compared to the Zoom H4n recording. Okay? So the solid state recorder. And so you can see that the measurements are different in basically all cases, so the unstressed vowels here in the kind of blue-y color here have an average of 80 milliseconds, but they vary quite a bit, and the stressed vowels have an average of about 120 milliseconds, but in each case, we have a clear distribution of unstressed vowels and a clear distribution of stressed vowels, and those are clearly separated. Right? And they’re clearly separated for each of the devices and clearly separated for each of the software programs, even though the exact measurements differ by some margin in each case. Okay, so that translates in practice to, we would still be able to use any of these recordings to recover stressed versus unstressed vowels, but if we care about the measurements exactly, so say we’re doing phonetics or say we’re thinking about how different dialects might have different degrees of vowel length or different degrees of lengthening under stress or shortening under non-stressed vowels, then those sorts of measurements are going to be different. So for comparative projects, it matters what sort of recordings are made, but for simply thinking about what the sounds are, then we’re able to recover pretty much everything.
Another example is in vowel spaces, so these are different English vowels for the three of us who made the recordings, and the different colors for each of the vowels are the different types of recorder, again. So “CB” is me here. You can see that the [i] vowels, or the, such as the [other 25:05] vowels, the [i] vowels in “bit” or in “pit” or something like that, they’re all pretty close to one another, so if we were comparing these vowels, they’re all in a pretty close cluster, whereas the [u] vowels here are much more spread out, so one of the… iPhone, for instance, thinks that my [u] vowel is much fronter, much more like an [i]. I guess this is probably not going to come over across Webex because we’re doing the remote digital stuff as well, but this is something like an [u] and this is more something like an [ʉ]. Okay? So these would likely have substantial perceptual effects on the vowel. You can also see that different vowels and different speakers end up with different measurements and different degrees of measurement in each of the cases, so for instance, with my recordings, the [u] vowels vary quite a lot between the recording, different recording devices, but for my colleague Natalie, Natalie’s vowels are pretty clumped, Natalie’s back vowels are pretty clumped. And so one of the things that worries us a bit about these recordings are the inconsistencies in results that we get for different speakers under the same sorts of recording conditions. Now, again, there will be circumstances where that doesn’t matter that much, but there are circumstances where that matters quite a lot. Let me briefly compare this with the different types of software program now, and you can see that the vowel spaces, again, we have differences in individual vowels and in individual vowel recordings. The vowel space is kind of elongated. That’s not just the way the pictures are put onto the slide. It’s actually that there’s a kind of attenuation in the first formant that raises some of the vowels and lowers some of them, which is also, I guess, interesting from an engineering point of view, but also, if we want to make vowel space recordings of different languages, then there’s something we need to know about and consider for the types of recordings we do.
And so with that, now you’ve seen the results and some of the specific differences between the recordings. Let me talk a bit more about the implications. Oh, I should also say that if you’re interested in the results, the paper that is up on lingbuzz at the moment and that I put the link to in the chat has a lot of detail about a lot of different measurements, and so the details there are available, and I’m very happy to talk about them either in the question period or you’re welcome to email me about them.
But let’s talk a little bit more about the implications. So, first of all, it might be difficult to directly compare recordings made in person with those that are made over Zoom or over some sort of recording software. That is, the, just like people sound a little different when they’re talking over the phone versus talking in person but our brains are able to compensate for that, it may well be… It is the case that talking over Zoom versus recording solid state, with a solid state recorder, does make a difference such that the measurements of their speech are somewhat different. So the recording program does matter. The device also matters to some extent, but less so. Most of the device differences were not significant. And so this has implications for research or language documentation that relies on detailed comparison of speech across lots of different people, either different dialects, different areas, or different individuals and like people who are recorded in different ways. Okay? We also found that the quality of the recording varies a lot, and some types of recordings, some devices, some software programs produce better quality recording than others. And, perhaps ironically, there’s a trade-off here with the higher amounts of filtering. So the videoconferencing that’s optimized for speech, like Zoom, in some ways produces better recordings in that it filters out more background noise, but in practice, that has effects on the phonetic recordings and the measurements that we can make, so this is something I think to consider for individual projects, whether it’s likely that you, if in the future you might want to do phonetic measurements, then having more filters is a problem. On the other hand, having less background noise is also good, and so that’s some… I think all of these things are things to think about and to discuss as part of the project. There’s not a single, “Yes, you must do this,” because, of course, all projects are different. Okay?
So given all of what I’ve said, the recording setup for what we tested did not have a huge impact on our ability to identify contrasts, so in terms of transcription, the recording setup matters, the programs, the software matters, but none of what we looked at really stops us from being able to recover the contrast that we were looking for. Okay? So that said, the general setup such as the placement of the recorder, having a quiet environment and so on, is still really important even with electronic filters and so on. So those are all going to make good recordings, help us make good recordings.
So in conclusion, let me just kind of sum up with some general recommendations, and then I’ll take some questions. So I have sort of general recommendations and then some specific things about devices. So, where possible, use the same setup when you’re making recordings, so try and do the same thing each time for recording speakers, for making comparisons of recordings and making long-distance recordings, so find a setup that works and use that all the time rather than, say, using Zoom one time, Skype another time, and so on. We should also all be aware that the pandemic recordings might have differences from the earlier in-person recordings, even when they were made with the same speakers, so in terms of measurements or in terms of just combining materials, then we should be aware that these things might differ quite a lot, and you should test this explicitly if it might be important for your project, so rather than taking things, either taking things on trust or just assuming that you’ll be able to combine recordings made through a solid state recorder versus recordings made over Zoom, actually test the setup and test the output to make sure it’s comparable enough for what you need.
In terms of devices, there are a couple of recommendations… Actually, these recommendations are pretty similar to what we’re already doing, I think. So recording in person with a device is better than recording over Zoom, so if you have the option to record in a quiet area like a sound booth or a homemade sound booth, which is an awesome setup, with a solid state recorder, that’s going to be better than recording over Zoom. If that’s not possible, recording with a cell phone or recording to computer is also pretty similar to the solid state recorders, but it’s not identical. You should avoid compressed audio formats, if possible. So, for instance, the voice memos on a, recorder on an iPad has two options. There’s a compressed option and an uncompressed option. Using the uncompressed option is preferable if possible. And I guess just overall, if the choice is between having someone record themselves on their phone and upload it versus recording to a computer over Zoom, from our tests, it’s better to record by phone, have someone use a voice memo recorder, or a recorder, recording software on their phone and then upload that rather than recording remotely. That seems to produce more reliable recordings.
For software recommendations, we… I didn’t talk very much about this, but Skype and Zoom were pretty similar to each other. They both had some noise handling and filtering issues. This was somewhat alarming to us for Zoom, because of course Zoom has original audio options, and it has options to turn these things off, but when we turned them off, they didn’t turn off. We still found artifacts in the recording, which I think is somewhat problematic. Facebook Messenger produced consistently pretty weird results, and so there are things happening to the audio as it’s being transmitted over Facebook that make it clearly not as good an option as Zoom or Skype or programs like that. The podcast software Cleanfeed produced the clearest recordings and the software that was most similar to the solid state recorder. It’s audio-only, so that’s an option for interactive audio, but of course, there’s a big advantage to having video as well, and so that’s another trade-off to think about for your project.
We didn’t test internet connections. The testing we did was with reasonably reliable, reasonably high speed recordings, but I expect that connection would play a very big role in the quality of recording as well, which is another argument for recording locally and uploading as well.
So I’m out of time, so let me just briefly conclude with just a summary of what I’ve said so far, that all devices are not the same, so thinking about device is important, but the devices matter less than the software. If you can record… If you have a choice between recording remotely and uploading versus recording over the web, record locally and upload, that produces better recordings. If you can’t do that and you just need audio, then something like Cleanfeed is a good option. Skype and Zoom were pretty equivalent, and Facebook Messenger was not a good option for recording. So with that, thank you very much for coming to this talk, and I’ll be very happy to take any questions.
Nicholas Welch: Thank you very much, Claire, for that really excellent talk. It was fascinating to me to see not only the similarities, but the differences, and particularly the differences among speakers when using the recording methods. I would not have expected that at all. We do have one question that was posed during your talk by Marie-Odile, which was, did you notice electronic interference between devices being close to each other?
CW: We didn’t. We expected to see that, and I remember the advice from early digital recordings and so on that you should always turn off your cell phone, because the pinging of the tower is going to turn up on the recording, or you should use batteries with your solid state recorder rather than plugging it into the mains electricity, because you can get humming from the mains. In fact… This is a total digression, but I saw there was a paper, it was an engineering paper that was identifying places of recording of language recordings through the specific patterns of interference on the recordings from… These were reel-to-reel tapes from the 1960s in Kansas, I think… I want to say it was an Osage project, but I’m not totally sure about that. And so in that case, that was a great example of being able to use the interference to locate recordings. But yeah, so we expected to see things like that, and we were a little worried about that, but in practice, it didn’t seem to be an issue. I guess the installation of these devices and the frequency with which they ping cell phone towers and so on is relatively less. We had much more issues, many more issues with the sound of the air conditioning system in our building and the fire engines that went past and things like that.
Marie-Odile Junker: Thank you.
Claire Bowern: Thanks.
Nicholas Welch: We have about five minutes remaining for questions, if anyone wants to send me a question mark.
Claire Bowern: I see there’s a comment in the chat about the length of the paper. I agree about that. One of the reasons the paper is so long is that we have a lot of supplementary materials with all the different things we tested, but the key findings are in section 3 of the paper, if I remember right, so there’s an overall summary, somewhat like what I did with the slideshow here, and then there’s more supporting detail for the phoneticians, but the main part is in section 3. And I’m happy to upload these slides to my website. I’ll do that after the recording, if that’s helpful to share.
Nicholas Welch: I have a question too, if I may. So, as I said, I was really fascinated and kind of astonished by the differences from speaker to speaker in the results of vowel spaces for different devices, and I was wondering if you have any idea why that is. Do different speakers… Is it because different speakers have different pitch ranges and different devices have different abilities to pick up those ranges?
Claire Bowern: Yeah, I think there’s a couple of things going on here. So one is… Yeah, one is the relationship between vowel formants and pitch probably, and so the, yeah, so the recordings are maybe picking slightly different things. We also found that there was quite a difference in the boundary identification for the forced alignment. So we are doing… We used an automatic forced alignment program called P2FA, and so what that does is it takes the transcripts that we have, so the written form, we say “mug” again, and it takes that bit of the audio and, “Okay, so here’s the bit that says ‘we’; here’s the bit that says ‘say,’” etc. And so it finds the boundaries between the consonants and the vowels and lines up the segments. And sometimes the computer is pretty good at that, and sometimes it’s pretty not very good at that. We did some minimal checking to make sure that it was okay, but there may be some differences in speakers in how easily those boundaries are identifiable. On the other hand, we would expect that to show differences between speakers, but maybe not between recordings. One thing that was really alarming to me was the difference in the boundaries between the consonants and vowels even when the recording was supposed to be identical. So we played the solid state recorder into my computer, So that’s exactly the same digital file in each case, and so the boundary should have been… Like the segments should have been exactly the same length for each recording because they’re the same, because it is literally the same recording, but it wasn’t, but those boundaries weren’t. They differed in pretty unpredictable ways, and I presume that’s to do with the compression, either the compression at the level of the internet, like my service provider compressing signals to send to Juhyae’s computer, or it’s something to do with the actual software program, like Zoom itself. And so I suspect that some of these differences are the differences in the compression leading to different measurement, like automatic measurements, and so on. But that’s just a guess at the moment.
I see there is a question in the chat. I also have a question for participants. Could you let me know, like maybe just type in the chat, what sort of remote software you use for recording? So are there types of, are there software programs that we didn’t test here that would be really crucial to your work that we should include in a future version of this? So if you just type it in the chat while I read [Lydia’s 42:55] question. So have we tested ways to filter in editing software, and is the impact large? So in a recording with a lot of noise, is it better to use the original audio than try to filter it?
So, we didn’t test this. That’s a great question. We didn’t test it. In general, I think the device… The advice is to use the original recording, because when you filter a recording, say you filter the bandwidth, you’re still taking out… You’re taking out parts of the speech signal as well as the stuff that you don’t want, and so in general it’s better to have the noisier signal than to have the missing material. But yeah, I’d say it… There are circumstances where that would make more of a difference than others, I think, and also, if you’re using recordings for listening yourself without the phonetic measurements, it may be easier to clean them up. So I’m thinking for instance, we had a recording where there was a ceiling fan that was very predictable noise that was mostly above 5000Hz, so it was above the level of most of the speech recording, and so we removed that and that made it easier to hear. But that was for transcribing purposes, not for measurement purposes.
Nicholas Welch: Any further questions? Okay, we have a question there from [Muhammad Zakaria 44:40].
Claire Bowern: Yeah. Oh, right, so Android versus iPhone. I don’t think we had any difference between Android versus iPhone. I also have an Android phone, and so the, yeah, the recordings, I forget which… I’m sorry, I’m just trying to see right now… Oh, I don’t have the app that we recorded, but the app is listed in the file, and that seemed to work. Yeah. Also, I’m very sorry to hear about what’s going on in Myanmar at the moment, and I hope everyone you’re working with is okay. All right. Yeah, WhatsApp is… Yeah, we did not test WhatsApp. [Hi, Jorge 45:28]. Yeah, we did not test WhatsApp because the… My understanding of how WhatsApp works is that it records locally and then uploads the file, and so we figured that if we were testing WhatsApp, we would probably be testing more about the device’s recording, like the device’s microphone and so on, rather than the… And so that would be similar enough to our device condition that we didn’t test that separately, but we could certainly do that.
Nicholas Welch: Thank you very much, Claire, and we are now out of time for questions. We have a five-minute break before our next speaker, Chris Harvey, so once again, see everyone in a few minutes.
Leave a comment