How To Run Complex Queries With SQL in Vector Databases by Jianmei Zhang
Monday May 6^th, 2024 at 9:49 AM

The New Stack

Vector search looks for similar vectors or data points in a data set based on their vector representations. Unlike proprietary vector databases such as Pinecone, Milvus, Qdrant and Weaviate, MyScale is built on the open source, SQL-compatible ClickHouse database.

SQL is an effective tool for managing relational databases. Combining SQL and vectors provides a powerful approach to tackling complex AI-related questions. Users can execute traditional SQL and vector queries on structured data and vector embeddings (data) to address complex queries and analyze high-dimensional data in a unified, efficient manner.

Advanced SQL Techniques for Complex Queries

Simple SQL queries are commands that perform straightforward data retrieval, usually from only one table at a time. Complex SQL queries go beyond standard requests by retrieving data from several tables and limiting the result set with multiple conditions.

A complex query could include features such as:

Common table expressions
Subqueries
Joining to many tables and using different join types

Common Table Expressions

A common table expression (CTE) is a name you give a subquery within your main query. The main reason to do this is to simplify your query, making it easier to read and debug. It can sometimes improve performance, which is another benefit, but it’s mostly about readability and simplification.

Consider a scenario in which you want to determine the average age of customers who bought a particular product. You have a table of customer data, including their name, age and the products they purchased.

Here’s an example query to perform this calculation using a CTE:

This CTE — a temporary named result set (subquery) that can be referenced within a single query — is named product_customers. It’s created using a SELECT statement that retrieves the name and age columns from the customer_data table for customers who purchased the product 'widget'.

Moving the subquery to the top of the query and giving it a name makes it easier to understand what the query does. If your subquery selects a sample embedding vector, you could name your subquery something like target_vector_embed. When you refer to this in the main query, you’ll see this name and know what it refers to.

This is also helpful if you have a long query and need the same logic in several places. You can define it at the top of the query and refer to it multiple times throughout your main query.

So consider using CTEs when having a subquery improves the readability of your query.

Subqueries

A subquery is a simple SQL command embedded within another query. By nesting queries, you can set up larger restrictions on the data included in the result set.

Subqueries can be used in several places within a query, but it’s easiest to start with the FROM statement. Here’s an example of a basic subquery:

I’ll break down what happens when you run the above query:

First, the database runs the “inner query” — the part between the parentheses. If you run this independently, it produces a result set just like any other query. However, after the inner query runs, the outer query runs using the results from the inner query as its underlying table:

Subqueries must have names, which are added after parentheses (the same way you would add an alias to a regular table). This query uses the name sub.

Using Subqueries in Conditional Logic

You can use subqueries in conditional logic (in conjunction with WHERE, JOIN/ON or CASE). The following query returns all the entries from the same date as the specified entry in the data set:

This query works because the result of the subquery is only one cell. Most conditional logic will work with subqueries containing one-cell results. However, IN is the only type of conditional logic that will work when the inner query contains multiple results:

Note that you should not include an alias when you write a subquery in a conditional statement. This is because the subquery is treated as an individual value (or set of values in the IN clause) rather than as a table.

Joining Tables

join produces a new table by combining columns from one or multiple tables by using values common to each. Different types of joins are:

INNER JOIN: Only matching rows are returned.
LEFT JOIN: Nonmatching rows from the left table and matching rows are returned.
RIGHT JOIN: Nonmatching rows from the right table and matching rows are returned.
FULL JOIN: Nonmatching rows from both tables and matching rows are returned.
CROSS JOIN: Produces the Cartesian product of whole tables, as “join keys” are not specified.

Using Complex SQL and Vector Queries in MyScale

SQL vector database MyScale includes several features that help with complex SQL and vector queries.

Common Table Expressions

MyScale supports CTE and substitutes the code defined in the WITH clause for the rest of the SELECT query. Named subqueries can be included in the current and child query context anywhere table objects are allowed.

Vector search is a search method that represents data as vectors. It is commonly used in applications such as image, video and text search. MyScale uses the distance() function to perform vector searches. It calculates the distance between a specified vector and all vector data in a specified column and returns the top candidates.

In some cases, if the specified vector is obtained from another table or the dimension of the specified vector is large and inconvenient to represent, you can use CTE or subquery.

Assume you have a vector table named photos that stores metadata information linked to your photo library’s images, with id, photo_id and photo_embed for the embedding vector.

The following example treats the result of a selection in CTE as a target vector to execute vector search:

Joins and Subqueries

There is limited support in MyScale for join, and using subquery is recommended as a workaround. In MyScale, the vector search is based on the vector index on a table with a vector column. Although the distance() function appears in the SELECT clause, its value is calculated during vector search on the table, not after join. The join result may not be the expected result.

The following are possible workarounds:

You can use the distance()...WHERE...ORDER BY...LIMIT query pattern in subqueries that utilize vector indexes and get expected results on vector tables.
You can also use subqueries in the WHERE clause to rewrite the join.

Assume you have another table, photo_meta, that stores information about the photo library’s images with photo_id, photo_author, year and title. The following example retrieves relevant photos taken in 2023 from a collection of images:

Here’s what happens when you run the above query:

First, MyScale executes vector search on the table photos to get the required column photo_id and the value of the distance() function for the top five relevant records:

Then, the join runs using the results from the vector table as its underlying table:

Because the vector search doesn’t consider the year photos were taken, the result may be incorrect. To get the correct result, rewrite the join query by using a subquery:

Improve Data Analysis

Advanced SQL techniques like CTEs, subqueries and joins can help you perform complex data analyses and manipulations with greater precision and efficiency. MyScale combines the power of SQL and vectors to provide a powerful approach to tackling complex AI-related questions. With MyScale, you can efficiently execute traditional SQL and vector queries on structured and vector data to address complex queries and analyze high-dimensional data in a unified and efficient manner.

If you are interested in learning more, please follow us on X (Twitter) or join our Discord community. Let’s build the future of data and AI together!

The post How To Run Complex Queries With SQL in Vector Databases appeared first on The New Stack.

Read the whole story

alvinashcraft

3 hours ago

reply

West Grove, PA

Abstracts: May 6, 2024 by Alyssa Hughes
Monday May 6^th, 2024 at 9:49 AM

Microsoft Research

Stylized microphone and sound waves illustration.

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In this episode, Senior Principal Researcher Michel Galley joins host Gretchen Huizinga to discuss “MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts,” which was accepted at the 2024 International Conference on Learning Representations (ICLR). MathVista, an open-source benchmark, combines new and existing data to measure how good models are at solving a variety of math problems that involve processing images as well as text, helping to gain insight into their reasoning capabilities.

Read the paper

Get the code

Transcript

[MUSIC]

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.

[MUSIC FADES]

My guest today is Dr. Michel Galley, a senior principal researcher at Microsoft Research. Dr. Galley is the coauthor of a paper called “MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts.” Michel, thanks for joining us on Abstracts today!

MICHEL GALLEY: Thank you for having me.

HUIZINGA: So I like to start with a distillation or sort of an elevator pitch of your research. Tell us in just a couple sentences what problem or issue your paper addresses and why we should care about it.

GALLEY: So this paper is about evaluating large foundation models. So it’s a very important part of researching large language models because it’s a good way to evaluate, kind of, the capabilities—what these models are good at and not good at. And a part of the focus of MathVista is to evaluate these large foundation models in a multimodal setup, so when the input to the model is actually not just text but also text and images. And then, an example of a task that such a model would perform is, like, the input is maybe a mathematical question, and then there’s some visual support to that question, let’s say, of an image of a graph, and then the model has to respond to something related to that. And why this is important … there has been a lot of work, of course, on large foundation model. Especially when it comes to reasoning tasks, like mathematical reasoning, a lot has focused more on written form.

HUIZINGA: Yeah …

GALLEY: So MathVista is one of the very first datasets that has input that is both images and text.

HUIZINGA: Yeah, yeah. Well, reading your paper, it seems like this is an area that hasn’t been studied systematically. In fact, you actually say that! And say that the field is largely unexplored. But quickly tell us what has been done in this field, and then tell us how your research addresses the proverbial gap in the literature.

GALLEY: Well, there has been a lot of work on vision and language in other problems, like not just about reasoning. Maybe let me just mention why reasoning is important. So one reason I think it’s very interesting to evaluate these large language models in terms of reasoning skill is that we evaluate their capabilities beyond just memorization. So as many of your listeners probably know, these large foundation models are trained on large amounts of text that is public data from various sources. So when you ask a question to a large foundation model, it could be the case, in many cases, that it just memorizes things it has seen in the data.

HUIZINGA: Sure.

GALLEY: So what makes it interesting in terms of reasoning, the answer oftentimes is not there in the data. So it needs to develop this ability to connect the dots between various pieces of information to come up with a new answer. So the focus of our paper is really on mathematical reasoning, but it goes also a bit beyond that because what is also represented in the data is also science question and so on.

HUIZINGA: Yeah …

GALLEY: So this reasoning part has largely focused, until MathVista, on text-only modalities.

HUIZINGA: Yeah …

GALLEY: So it’s one of our very first ones that combines text and images in terms of evaluating these large foundation models. So you ask about what was done before. So, yes, there has been a lot of work, text only, on reasoning, for example, the mathematical question that’s just based on text. And there has been a different stream of work that was much more focused on vision. A lot of work has been on tasks such as visual question answering …

HUIZINGA: Yeah …

GALLEY: … where basically, you have an image and the question is about answer a question about this image. So, yes, we’re trying to fuse the two lines of research here.

HUIZINGA: Right …

GALLEY: And that’s one of the first works that does that.

HUIZINGA: Yeah. Well, let’s talk about your methodology for a minute. Tell us how you went about conducting this research, and what methods did you use?

GALLEY: Yes, sure. So that’s a bit different from a typical, kind of, machine learning paper because the focus on this work is really on benchmarking on the dataset. So the methodology is more about how we collect the data, process it. So they have two components to doing that. One was to look at existing data that already combines vision and text. And there are existing datasets that are actually already fairly big but that were not focused on reasoning. So we use those existing datasets and look for instances in the data that actually include some mathematical or science reasoning. And so that part is leveraging existing datasets, but the important part is, like, we really want to carve out what was interesting piece in terms of reasoning. And we had different stages of processing the data to identify the subset that was reasoning-based. So one first step was basically to apply some automatic filter to determine whether or not a given example, let’s say something that is visual and text, is actually … involves some mathematical reasoning. So we have different strategy. For example, if the answer is numerical, it’s likely that it might be something mathematically related. But that’s just the first stage. And the second stage, we actually had humans, annotators, just certify that the selected data is actually of high quality. So we do have an example of, “Oh, this is mathematical, and that’s either mathematical or scientific,” and so on. And that’s one part of the effort. The other part is that we realized while we collected the data, there are certain types of mathematical reasoning or related to mathematical reasoning that were not represented in the data. So we created three new datasets as part of MathVista. So when I said dataset, it’s more like, think of MathVista as like an aggregate of different types of data, and we added three of them, three new types of data. One is what you call PaperQA, which is basically data that is collected from scientific papers on arXiv, and that had questions asking about that paper and that included some visual components from the paper, typically a plot or a figure.

HUIZINGA: Yeah …

GALLEY: And then we had IQTest, which is basically, I mean, it’s vaguely related mathematically, but basically it also, kind of, tried to see maybe more abstractive thinking about maybe some input that is both text and visual. And the final is about FunctionQA, that is basically algebraic reasoning and function plots and so on.

HUIZINGA: OK …

GALLEY: The important part was actually to identify among vast amounts of data what is actually very interesting in terms of mathematical reasoning.

HUIZINGA: Yeah …

GALLEY: So that part, I think, was quite a big part of doing that work—finding existing data but also creating new data.

HUIZINGA: Yeah, yeah. Well, my favorite part of a research paper is where it says, “and what we found was … ,” so talk a little bit about your results. What did you find?

GALLEY: So we evaluated a wide variety of models, including GPT-4, Claude 2, GPT-4V, multimodal Bard, and LLaVA, and we categorized them into three categories. So one is text only. So, basically, you take a model that is by default just text, and we give it the text part of the question and ask it to answer the question. Of course, that’s, kind of, a bit of a, it’s a difficult task because oftentimes [LAUGHTER] we crucially build these questions so that you have to rely on the vision part. But that’s for, you know, scientific investigation to know how well they can do, and so that’s one category of model. A different category is still text only but that is given the detection from the image. So on the image, we do OCR. So we convert those words from images to text. It’s kind of an extension of the text-based model, except that what was images is translated into text, and then the input to the model is word only, and that’s a different category of model. And the third one is basically truly multimodal model. And what we found, I mean, not surprisingly, it’s, kind of, the one that was doing most poorly is the one that is text only. The second is text plus OCR. And then finally, the one that does best is the multimodal like GPT-4V. But while the ordering between these three categories makes sense, it was a bit surprising that maybe the gap between multimodal and text plus OCR was not bigger. Well, it’s big, but maybe not as big as we were expecting. So, for example, the best detection from the images model achieved like 35 percent accuracy while GPT-4V was 50 percent. So it’s a substantial gap but not huge.

HUIZINGA: Right. Just to clarify, you’re saying OCR. What does that stand for?

GALLEY: [Optical] character recognition.

HUIZINGA: Gotcha.

GALLEY: So, basically, it’s the task of taking text, sometimes typed, but sometimes written, and convert this into the actual text like you would have in a text file.

HUIZINGA: Right. Michel, does any of this have to do with the difficulty of the math problems that you present these models with? I mean, it seems to me, similar to humans, that the easier the problem, the easier it would be for the machine. So at what level of math are we talking for these tests?

GALLEY: What’s nice about MathVista is there’s continuum [of] different difficulties. So the spectrum is quite broad, going from elementary school to more advanced concepts such as calculus. So it’s quite broad. So in the paper, we do have this, kind of, broken down by level. So the number I gave you, like 50 percent, is an aggregate over all the difficulties. But …

HUIZINGA: Gotcha.

GALLEY: But the goal there was really, kind of, to compare different models, but we do have a fair amount of analysis in the appendix. Actually, we have 100 pages of appendices of plenty of analysis and so on. So if people, I mean …

HUIZINGA: I saw that. I saw the length of the paper, and I’m going, what? [LAUGHS] That’s a LONG paper! Well, research in the lab is one thing, I always like to say, but understanding real-world impact is important, too. So where’s this work going to make the most difference, and who does it help most at this point?

GALLEY: Well, I think perhaps that’s the main point of this kind of line of work in terms of reasoning is that when looking at this difficult problem that are mathematical, actually it’s a way to, kind of, abstract away maybe more complex capabilities, and I think while thinking just about mathematics might seem a bit narrow, I don’t think that really is. It’s more about seeing whether this model has the ability to do, kind of, multistep kind of processing of your input and think maybe somewhat intelligently about a given problem. So we focus mostly on math. There is some science, but we would be very interested, especially in future work, to, kind of, go beyond that.

HUIZINGA: OK, well, let me press in a little bit there because … just say I’m a regular person using a GPT model. Is your work more addressed upstream from that to the research community to say, how do we get these models to be better so that downstream people like me can be more confident of the models?

GALLEY: Yes, I would say at the moment, I mean, this line of work is perhaps more geared towards somewhat more research community, but I think it could be some seed for researchers to think about some applications perhaps that also requires some kind of step-by-step reasoning but perhaps not going beyond math.

HUIZINGA: Yeah. Michel, if there was one thing you wanted our listeners to take away from this research, kind of golden nugget, what would it be?

GALLEY: Well, I would say it’s the challenging part of these datasets. I think that’s what makes MathVista stand out compared to other datasets. By now, there are a few other vision and language datasets, and of course, many that are more text-based. And we’ve seen, for example, some recent papers showing that actually MathVista remains one of the most challenging ones. So I think it’s probably going to stay around for a while because of the difficulty it represents. So it’s open source of available datasets that everybody can use, and I very much encourage people to use it.

HUIZINGA: Is it on GitHub?

GALLEY: Yes, it’s on GitHub.

HUIZINGA: So what’s next on the research agenda for helping LLMs get better at math, Michel? What are the big challenges in the field yet? I mean, you’ve alluded to many of them already, sort of, but what’s next on your research agenda?

GALLEY: Well, I would say what we found so far is these models are very good at processing the textual part of problems it’s given, to the model, but you have the equivalent in images actually harder somehow. So I think a lot more work needs to be done in terms of vision capabilities, in terms of reasoning over images, because the capabilities you will see in text are actually quite advanced, whereas the equivalent in images doesn’t seem that good. I mean, a fair disclaimer: my background is more on the text side, [LAUGHTER] so some of my colleagues on the paper are more on the vision side, so maybe if a listener maybe run into some of our coauthors at the conference, they might want to talk to these vision people because that’s less of my background. [LAUGHS]

HUIZINGA: Well, and if you think about Venn diagrams, you know, you’ve got people that are doing text, people that are doing vision, and then the people that are trying to do both to see how the worlds collide.

[MUSIC]

Well, Michel Galley, thanks for joining us today. And to our listeners, thanks for tuning in. If you want to read this paper, you can find a link at aka.ms/abstracts (opens in new tab), or you can find it on arXiv. You can also read it on the website for the International Conference on Learning Representations, or ICLR. And if you happen to be at the ICLR conference this week, you can hear more about it there. See you next time on Abstracts!

[MUSIC FADES]

The post Abstracts: May 6, 2024 appeared first on Microsoft Research.

Read the whole story

alvinashcraft

3 hours ago

reply

West Grove, PA

Tutorial: Install VS Code on a cloud provider VM and set up remote access by Cesar Saavedra
Monday May 6^th, 2024 at 9:49 AM

GitLab

DevSecOps teams can sometimes find they need to run an instance of Visual Studio Code (VS Code) remotely for team members to share when they don't have enough local resources. However, installing, running, and using VS Code on a remote virtual machine (VM) via a cloud provider can be a complex process full of pitfalls and false starts. This tutorial covers how to automate the installation of VS Code on a VM running on a cloud provider.

This approach involves two separate GitLab projects, each with its own pipeline. The first one uses Terraform to instantiate a virtual machine in GCP running Linux Debian. The second one installs VS Code on the newly instantiated VM. Lastly, we provide a procedure on how to set up your local Mac laptop to connect and use the VS Code instance installed on the remote VM.

Create a Debian Linux distribution VM on GCP

Here are the steps to create a Debian Linux distribution VM on GCP.

Prerequisites

A GCP account. If you don't have one, please create one.
A GitLab account on gitlab.com

Note: This installation uses:

Debian 5.10.205-2 (2023-12-31) x86_64 GNU/Linux, a.k.a Debian 11

Create a service account and download its key

Before you create the first GitLab project, you need to create a service account in GCP and then generate and download a key. You will need this key so that your GitLab pipelines can communicate to GCP and the GitLab API.

To authenticate GCP with GitLab, sign in to your GCP account and create a GCP service account with the following roles:

Compute Network Admin
Compute Admin
Service Account User
Service Account Admin
Security Admin

Download the JSON file with the service account key you created in the previous step.
On your computer, encode the JSON file to base64 (replace /path/to/sa-key.json to the path where your key is located):
```
base64 -i /path/to/sa-key.json | tr -d \\n
```

NOTE: Save the output of this command. You will use it later as the value for the BASE64_GOOGLE_CREDENTIALS environment variable.

Configure your GitLab project

Next, you need to create and configure the first GitLab project.

Create a group in your GitLab workspace and name it gcpvmlinuxvscode.

Inside your newly created group, clone the following project:

git@gitlab.com:tech-marketing/sandbox/gcpvmlinuxvscode/gcpvmlnxsetup.git

Drill into your newly cloned project, gcpvmlnxsetup, and set up the following CI/CD variables to configure it:
1. On the left sidebar, select Settings > CI/CD.
2. Expand Variables.
3. Set the variable BASE64_GOOGLE_CREDENTIALS to the base64 encoded JSON file you created in the previous section.
4. Set the variable TF_VAR_gcp_project to your GCP project ID.
5. Set the variable TF_VAR_gcp_region to your GCP region ID, e.g. us-east1, which is also its default value.
6. Set the variable TF_VAR_gcp_zone to your GCP zone ID, e.g. us-east1-d, which is also its default value.
7. Set the variable TF_VAR_machine_type to the GCP machine type ID, e.g. e2-standard-2, which is also its default value.
8. Set the variable TF_VAR_gcp_vmname to the GCP vm name you want to give the VM, e.g. my-test-vm, which is also its default value.

Note: We have followed a minimalist approach to set up this VM. If you would like to customize the VM further, please refer to the Google Terraform provider and the Google Compute Instance Terraform provider documentation for additional resource options.

Provision your VM

After configuring your project, manually trigger the provisioning of your VM as follows:

On the left sidebar, go to Build > Pipelines.
Next to Play ({play}), select the dropdown list icon ({chevron-lg-down}).
Select Deploy to manually trigger the deployment job.

When the pipeline finishes successfully, you can see your new VM on GCP:

Check it on your GCP console's VM instances list.

Remove the VM

Important note: Only run the cleanup job when you no longer need the GCP VM and/or the VS Code that you installed in it.

A manual cleanup job is included in your pipeline by default. To remove all created resources:

On the left sidebar, select Build > Pipelines and select the most recent pipeline.
For the destroy job, select Play ({play}).

Install and set up VS Code on a GCP VM

Perform the steps in this section only after you have successfully finished the previous sections above. In this section, you will create the second GitLab project that will install VS Code and its dependencies on the running VM on GCP.

Prerequisites

A provisioned GCP VM. We covered this in the previous sections.

Note: This installation uses:

VS Code Version 1.85.2

Configure your project

Note: Since you will be using the ssh command multiple times on your laptop, we strongly suggest that you make a backup copy of your laptop local directory $HOME/.ssh before continuing.

Next, you need to create and configure the second GitLab project.

Head over to your GitLab group gcpvmlinuxvscode, which you created at the beginning of this post.

Inside group, gcpvmlinuxvscode, clone the following project:

git@gitlab.com:tech-marketing/sandbox/gcpvmlinuxvscode/vscvmsetup.git

Drill into your newly cloned project, vscvmsetup and set up the following CI/CD variables to configure it:
1. On the left sidebar, select Settings > CI/CD.
2. Expand Variables.
3. Set the variable BASE64_GOOGLE_CREDENTIALS to the base64 encoded JSON file you created in project gcpvmlnxvsc. You can copy this value from the variable with the same name in project gcpvmlnxvsc.
4. Set the variable gcp_project to your GCP project ID.
5. Set the variable gcp_vmname to your GCP region ID, e.g. us-east1.
6. Set the variable gcp_zone to your GCP zone ID, e.g. us-east1-d.
7. Set the variable vm_pwd to the password that you will use to ssh to the VM.
8. Set the variable gcp_vm_username to the first portion (before the "@" sign) of the email associated to your GCP account, which should be your GitLab email.

Run the project pipeline

After configuring the second GitLab project, manually trigger the provisioning of VS Code and its dependencies to the GCP VM as follows:

On the left sidebar, select Build > Pipelines and click on the button Run Pipeline. On the next screen, click on the button Run pipeline.

The pipeline will:
- install xauth on the virtual machine. This is needed for effective X11 communication between your local desktop and the VM
- install git on the VM
- install Visual Studio Code on the VM.
At this point, you can wait until the pipeline successfully completes. If you don't want to wait, you can continue to do the first step of the next section. However, you must ensure the pipeline has successfully completed before you can perform Step 2 of the next section.

Connect to your VM from your local Mac laptop

Now that you have an instance of VS Code running on a Linux VM on GCP, you need to configure your Mac laptop to be able to act as a client to the remote VM. Follow these steps:

To connect to the remote VS Code from your Mac, you must first install XQuartz on your Mac. You can execute the following command on your Mac to install it:

brew install xquartz

Or, you can follow the instructions from the following tutorial from the University of North Dakota.

After the pipeline for project vscvmsetup successfully executes to completion (pipeline you manually executed in the previous section), you can connect to the remote VS Code as follows:

Launch XQuartz on your Mac (it should be located in your Applications folder). Its launching should open up an xterm on your Mac. If it does not, then you can select Applications > Terminal from the XQuartz top menu.
On the xterm, enter the following command:

gcloud compute ssh --zone "[GCP zone]" "[name of your VM]" --project "[GCP project]" --ssh-flag="-Y"

Where:

[VM name] is the name of the VM you created in project gcpvmlnxvsc. Its value should be the same as the gcp_project variable.
[GCP zone] is the zone where the VM is running. Its value should be the same as the gcp_vmname variable.
[GCP project] is the name of your GCP project assigned name. Its value should be the same as the gcp_project variable.

Note: If you have not installed the Google Cloud CLI, please do so by following the Google documentation.

If you have not used SSH on your Mac before, you may not have a .ssh in your HOME directory. If this is the case, you will be asked if you would like to continue with the creation of this directory. Answer Y.
Next, you will be asked to enter the same password twice to generate a public/private key. Enter the same password you used when defining the variable vm_pwd in the required configuration above.
Once the SSH key is done propagating, you will need to enter the password again two times to log in to the VM.
You should now be logged in to the VM.

Create a personal access token

The assumption here is that you already have a GitLab project that you would want to open from and work on the remote VS Code. To do this, you will need to clone your GitLab project from the VM. First, you will be using a personal access token (PAT) to clone your project.

Head over to your GitLab project (the one that you'd like to open from the remote VS Code).
From your GitLab project, create a PAT, name it pat-gcpvm and ensure that it has the following scopes: read_repository, write_repository, read_registry, write_registry, and ai_features
Save the generated PAT somewhere safe; you will need it later.

Clone the read_repository

On your local Mac, from the xterm where you are logged on to the remote VM, enter the following command:

git clone https://[your GitLab username]:[personal_access_token]@gitlab.com/[GitLab project name].git

Where:

[your GitLab username] is your GitLab handle.
[personal_access_token] is the PAT you created in the previous section.
[GitLab project name] is the name of the project that contains the GitLab Code Suggestions test cases.

Launch Visual Studio Code

From the xterm where you are logged in to the VM, enter the following command:

code

Wait for a few seconds and Visual Studio Code will appear on your Mac screen.

From the VS Code menu, select **File > Open Folder..."
In the File chooser, select the top-level directory of the GitLab project you cloned in the previous section

That's it! You're ready to start working on your cloned GitLab project using the VS Code that you installed on a remote Linux-based VM.

Troubleshooting

While using the remotely installed VS Code from your local Mac, you may encounter a few issues. In this section, we provide guidance on how to mitigate them.

Keyboard keys not mapped correctly

If, while running VS Code, you are having issues with your keyboard keys not being mapped correctly, e.g. letter e is backspace, letter r is tab, letter s is clear line, etc., do the following:

In VS Code, select File > Preferences > Settings.
Search for "keyboard". If having issues with the letter e, then search for "board". Click on the "Keyboard" entry under "Application."
Ensure that the Keyboard Dispatch is set to "keyCode."
Restart VS Code.
If you need further help, this is a good resource for keyboard problems.

Error loading webview: Error

If while running VS Code, you get a message saying:

"Error loading webview: Error: Could not register service worker: InvalidStateError: Failed to register a ServiceWorker: The document is in an invalid state."

Exit VS Code and then enter this cmd from the xterm window:

killall code

You may need to execute this command two or three times in a row to kill all VS Code processes.

Ensure that all VS Code-related processes are gone by entering the following command from the xterm window:

ps -ef | grep code

Once all the VS Code-related processes are gone, restart VS Code by entering the following command from the xterm window:

code

Some useful commands to debug SSH

Here are some useful commands to run on the VM that can help you debug SSH issues:

To get the status, location and latest event of sshd:

sudo systemctl status ssh

To see the log of sshd:

journalctl -b -a -u ssh

To restart to SSH daemon:

sudo systemctl restart ssh.service

Or

sudo systemctl restart ssh

To start a root shell:

sudo -s

Get started

This article described how to:

instantiate a Linux-based VM on GCP
install VS Code and dependencies on the remote VM
clone an existing GitLab project of yours in the remote VM
open your remotely cloned project from the remotely installed VS Code

As a result, you can basically use your laptop as a thin client that accesses a remote server, where all the work takes place.

The automation to get all these parts in place was done by GitLab. Sign up for a free 30-day GitLab Ultimate trial to get started today!

Read the whole story

alvinashcraft

3 hours ago

reply

West Grove, PA

Platform Engineering: What is it? And how it applies to DevOps Engineers and SREs by Chris Pietschmann
Monday May 6^th, 2024 at 9:49 AM

Build5Nines

In the ever-evolving landscape of software development and operations, the need for streamlined processes, robust infrastructure, and efficient deployment practices has become paramount. DevOps Engineer and Site Reliability Engineer (SRE) roles have emerged as two prominent methodologies to address these challenges. However, bridging the gap between development and operations requires a specialized approach. The latest […]

The article Platform Engineering: What is it? And how it applies to DevOps Engineers and SREs appeared first on Build5Nines.

Read the whole story

alvinashcraft

3 hours ago

reply

West Grove, PA

Ep. 4 - From Portal to Code
Monday May 6^th, 2024 at 9:47 AM

CloudChat

Episode 0004 - From Portal to Code

Following the conversation on Deployment Stamps, Carl and Brandon dive into the concepts of Infrastructure as Code (IaC) and its applications in cloud computing. The hosts discuss the benefits of using IaC, including version control and transparency in infrastructure deployment, making it easier to understand what has been built and why. They also highlight the importance of governance, such as naming conventions, in IaC, as this can make it easier to find resources later on. Carl and Brandon cover both first and third-party IaC frameworks, as well as pros and cons of each.

Show links:

IaC Frameworks
- First-Party (Cloud Provider-Specifid)
- Third Party (Cloud Agnostic)
  - Ansible
  - Crossplane.io
  - Pulumi
  - Terraform
    - OpenTofu
Other Links of Interest

Visit us at:

Download audio: https://traffic.libsyn.com/secure/dfd84fca-7055-4605-98ce-3b0f8cbd6e15/cc-0004.mp3?dest-id=4423448

Read the whole story

alvinashcraft

3 hours ago

reply

West Grove, PA

The Humility to Listen, A Sprint Planning Turnaround Story | Mike Lyons by Vasco Duarte
Monday May 6^th, 2024 at 9:47 AM

Scrum Master Toolbox Podcast: Agile storytelling from the trenches

Mike Lyons: The Humility to Listen, A Sprint Planning Turnaround Story

Read the full Show Notes and search through the world’s largest audio library on Scrum directly on the Scrum Master Toolbox Podcast website: http://bit.ly/SMTP_ShowNotes.

When Mike took on one of his early projects, a seemingly small requirement during sprint planning sparked an epiphany about Agile workflows. Mike's story unfolds as he learns that dictating tasks isn't always the best approach, and he realized the importance of listening and empowering those who do the work. What crucial lessons did Mike learn about intellectual humility and creating space for his team to excel? Discover how these insights transformed his approach to sprint planning and why the Toyota Production System might hold secrets to doing your best work.

[IMAGE HERE] Recovering from failure, or difficult moments is a critical skill for Scrum Masters. Not only because of us, but also because the teams, and stakeholders we work with will also face these moments! We need inspiring stories to help them, and ourselves! The Bungsu Story, is an inspiring story by Marcus Hammarberg which shows how a Coach can help organizations recover even from the most disastrous situations! Learn how Marcus helped The Bungsu, a hospital in Indonesia, recover from near-bankruptcy, twice! Using Lean and Agile methods to rebuild an organization and a team! An inspiring story you need to know about! Buy the book on Amazon: The Bungsu Story - How Lean and Kanban Saved a Small Hospital in Indonesia. Twice. and Can Help You Reshape Work in Your Company.

About Mike Lyons

After reading the Agile Manifesto in 2006, Mike focused on making teams and organizations more adaptive and efficient. Despite facing failures and mistakes, these experiences provided him with valuable lessons that enhanced his ability to achieve tangible results with Agile.

You can link with Mike Lyons on LinkedIn.

Download audio: https://traffic.libsyn.com/secure/scrummastertoolbox/20240506_Mike_Lyons_M.mp3?dest-id=246429

Read the whole story

alvinashcraft

3 hours ago

reply

West Grove, PA

How To Run Complex Queries With SQL in Vector Databases by Jianmei Zhang Monday May 6th, 2024 at 9:49 AM

Advanced SQL Techniques for Complex Queries

Common Table Expressions

Subqueries

Using Subqueries in Conditional Logic

Joining Tables

Using Complex SQL and Vector Queries in MyScale

Common Table Expressions

Joins and Subqueries

Improve Data Analysis

Abstracts: May 6, 2024 by Alyssa Hughes Monday May 6th, 2024 at 9:49 AM

Subscribe to the Microsoft Research Podcast:

Transcript

Tutorial: Install VS Code on a cloud provider VM and set up remote access by Cesar Saavedra Monday May 6th, 2024 at 9:49 AM

Create a Debian Linux distribution VM on GCP

Prerequisites

Create a service account and download its key

Configure your GitLab project

Provision your VM

Remove the VM

Install and set up VS Code on a GCP VM

Prerequisites

Configure your project

Run the project pipeline

Connect to your VM from your local Mac laptop

Create a personal access token

Clone the read_repository

Launch Visual Studio Code

Troubleshooting

Keyboard keys not mapped correctly

Error loading webview: Error

Some useful commands to debug SSH

Get started

Platform Engineering: What is it? And how it applies to DevOps Engineers and SREs by Chris Pietschmann Monday May 6th, 2024 at 9:49 AM

Ep. 4 - From Portal to Code Monday May 6th, 2024 at 9:47 AM

The Humility to Listen, A Sprint Planning Turnaround Story | Mike Lyons by Vasco Duarte Monday May 6th, 2024 at 9:47 AM

Mike Lyons: The Humility to Listen, A Sprint Planning Turnaround Story

How To Run Complex Queries With SQL in Vector Databases by Jianmei Zhang
Monday May 6^th, 2024 at 9:49 AM

Abstracts: May 6, 2024 by Alyssa Hughes
Monday May 6^th, 2024 at 9:49 AM

Tutorial: Install VS Code on a cloud provider VM and set up remote access by Cesar Saavedra
Monday May 6^th, 2024 at 9:49 AM

Platform Engineering: What is it? And how it applies to DevOps Engineers and SREs by Chris Pietschmann
Monday May 6^th, 2024 at 9:49 AM

Ep. 4 - From Portal to Code
Monday May 6^th, 2024 at 9:47 AM

The Humility to Listen, A Sprint Planning Turnaround Story | Mike Lyons by Vasco Duarte
Monday May 6^th, 2024 at 9:47 AM