Skip to main content

Posts

Microsoft Fabric

Complete Analytics Platform  In its new Software as a service offering, Microsoft basically clubbed every tool in their Analytics portfolio and gave it a new name - Fabric :). Claims are Fabric can serve every Data stakeholder ranging from a developer working with Data Lake to a Sales associate working on a self-serve Powerbi report. Microsoft has implemented tenant centric architecture in Fabric like office 365, In optimal design an organization will have 1 fabric similar to 1 office 365 tenant for entire organization. Lake centric and Open  All the data and apps built on Fabric provided solutions will get stored at a single lake, It auto calculates the lineage for objects stored on a single data lake. It uses delta file format and parquet data storage for all the objects.  Advantage: Table storage is shared across the fabric workspace, suppose you have a data issue in a Synapse datawarehouse query, just run a fix on the data set using Synapse data engineering python not...
Recent posts

5 things you should never ask ChatGPT

1. Never ask for financial advice. Though in the response ChatGPT recommends not to take its word for any financial decision but it will still give you some luring details and artifacts for the question asked. For instance try Asking future of Dollarama stock. This information you see in reply is not latest or up to date as ChatGPT only has access to data up till 2021. 2. Never ask about your own information. Asking questions about your personal identity might land you into trouble. This ML model will try to make a block chain and pull whatever is available about you on the web. ChatGPT being smartest of them all might also try to learn more and may deep dive.  3. Ask only Ethical questions. To build a better future with AI, asking constructing and probing questions is critically important. ChatGPT has a massive userbase that consists of almost all the age groups it is moral duty of each one of us to groom and nourish ChatGPT with quality content. 4. Never ask for medication. ChatG...

import in Python

Python allows you to reuse a code using imports. It can be constant values, general functions, formulas and what not. For instance you are writing a code to implement circle circumference formula, instead of hardcoding PI, Import Maths module from Python Standard Library  . This library has a massive list of facilities on offer. Learn about - ' from '

from in Python

While using Import, you can also import a part of the module. " from"  is used for that purpose. Math module for instance has a lot of features but we are bothered about the 'Floor' functionality. In this case we imported floor directly (in line 17th) from math using they keyword 'from'. Math is not required while calling 'floor' in case you directly imported floor.

ACLs in Azure Data Lake

Access Control Lists(ACLs) in azure are an extremely powerful toolset to provision granular levels of access in Azure Data Lake. Role-Based Access Control (RBAC) is best option to setup broader access levels however with ACLs you can reach the lowest possible grains as low as a file inside a blob container. Think of a scenario where you want to add more than 1 user to a folder inside a blob container and each one of them sees only their data - Possible with ACLs Prerequisites Azure Subscription Storage blob with hierarchical namespace enabled Reader Access on the storage object via RBAC How to setup ACLs in Azure Data Lake Like any other offering, Microsoft has a broad spectrum of tools/ways to setup ACLs, ranging from Azure Portal to writing python code . All the steps involved are available in Microsoft documentation, and in a very descriptive manner, therefore needless to rephrase again in this article. Instead, lets walk through some of the challenges one can c...

Hierarchies in Oracle.

This article explores the functionality and features offered by CONNECT BY clause in Oracle with a hands-on exercise approach. Prerequisite: Oracle 9g or lastest installed, any oracle SQL client. We have used Oracle's sample schema for this article, you can download it too from the link below. Execute this SQL in your oracle client and you should be all set with data and schema. https://download.oracle.com/oll/tutorials/DBXETutorial/html/module2/les02_load_data_sql.htm Let's get started with CONNECT BY clause in Oracle. This is basically an oracle clause to place eligible datasets in a hierarchical fashion. Meaning, usage of this function is generally for creating a new resultant query that will elaborate hierarchical relations in a table. Here is the basic syntax [ START WITH condition ] CONNECT BY [ NOCYCLE ] condition START WITH is an optional keyword that can be used as a starting point for hierarchy. CONNECT BY describes the relationship between a child and parent r...

How to work with XML files in Databricks using Python

This article will walk you through the basic steps of accessing and reading XML files placed at the filestore using python code in the community edition databricks notebook. We will also explore a few important functions available in the Spark XML maven library. Think of this article as a stepping stone in the databricks community edition. Features and functionalities elaborated herein can be scaled at the enterprise level using Enterprise editions of databricks to design reusable file processing frameworks. Requirements We will be using the Spark-XML package from Maven. **Spark 3.0 or above is required on your cluster for working with XML files. This article uses Databricks Community edition, refer to  this  video tutorial from Pragmatic works for getting started with  Databricks Community Edition . Create your first cluster in seconds : The next step is to install the Spark-XML library on your cluster. The cluster needs to be in a running state to install this li...