| SESSION | FEB-MAR 2025 |
| PROGRAM | MASTER OF BUSINESS ADMINISTRATION (MBA) |
| SEMESTER | III |
| COURSE CODE & NAME | DADS301 PROGRAMMING IN DATA SCIENCE |
Assignment Set – 1
Q1. (a) Describe a list. How are lists created in R? Explain with an example. Also, explain how the lists are accessed and modified.
(b) Discuss the function used to create new columns in R. Explain with an example for creation of a new column based on the values of existing columns in R. 5+5
Ans 1.
(a) Lists in R
In R, a list is a versatile data structure that can hold multiple elements of different data types. Unlike vectors that contain elements of the same type, lists allow a combination of numbers, strings, logical values, vectors, and even other lists. Lists are used extensively in data analysis and statistical modeling in R because of their flexibility in storing and handling heterogeneous data.
Creation of Lists in R
Lists are created in R using the list() function. This function can accept different types of objects as arguments and bundle them together in a single list object. Each item in the list can
Its Half solved only
Buy Complete assignment from us
Price – 190/ assignment
MUJ Manipal University Complete SolvedAssignments MARCH 2025
buy cheap assignment help online from us easily
we are here to help you with the best and cheap help
Contact No – 8791514139 (WhatsApp)
OR
Mail us- [email protected]
Our website – www.assignmentsupport.in
Q2. (a) Illustrate the following functions with example in R.
- spread ()
- gather ()
- filter ()
(b) Articulate the summary () method with an example in R. 5+5
Ans 2.
Demonstration of the spread() Function in R
The spread() function is used to transform data from a long format to a wide format in R. This function is part of the tidyr package and is particularly useful when you want to convert a key-value pair column into multiple columns based on a key. For instance, consider a dataset where each row shows a student’s score in a particular subject. If we want each subject as a column and the scores accordingly, we use spread().
Q3. (a) Criticize with an example the syntax of various looping constructs in R – For, While and Repeat statements.
(b) Examine with an example of random variable that follows Poisson distribution. How can this be simulated using R? 5+5
Ans 3.
(a) Criticize with an Example the Syntax of Various Looping Constructs in R – For, While and Repeat Statements
Syntax and Example of for Loop in R
The for loop is commonly used in R to iterate over a sequence of elements. Its syntax is straightforward but can become inefficient with large datasets compared to vectorized operations.
Example:
for (i in 1:5) {
print(paste(“Value is”, i))
}
While for loops are easy to understand, they can be slow for computations over large vectors. R is optimized for vectorized operations, so using loops excessively may affect performance.
Assignment Set – 2
Q4. (a) Discuss dictionaries in Python. Implement where a dictionary can be used.
(b) Illustrate how strings are converted into iterables in Python? Give a suitable example. 5+5
Ans 4.
(a) Understanding Dictionaries in Python
Dictionaries in Python are powerful data structures that store values as key-value pairs. Unlike lists or tuples where values are accessed by index, dictionaries allow access through unique keys. These keys must be immutable, meaning they can be strings, numbers, or tuples, but not lists. The basic syntax for a dictionary is enclosed within curly braces {} and follows the format key: value.
Dictionaries are unordered collections until Python 3.6. From Python 3.7 onwards, they
Q5. (a) Summarize “waffle charts”. When is it used? Explain with an example in python.
(b) Demonstrate the summation of elements be performed rowwise, columnwise and as a whole on a 2D array in Python. Explain with example. 5+5
Ans 5.
(a) Summarize Waffle Charts and Their Usage with Python Example
Concept of Waffle Charts
Waffle charts are a specialized form of data visualization used to depict part-to-whole relationships. Unlike pie charts or stacked bars, waffle charts use a grid layout filled with colored squares or “tiles,” where each tile represents a fixed unit of the total. Typically, 100 tiles are used to reflect percentages. Each category is assigned a color, and the number of colored tiles reflects its proportion. Waffle charts are especially effective in dashboards or
Q6. (a) Contrast the difference between loc and iloc attributes with example.
(b) Operate the map () method. Explain with example. 5+5
Ans 6.
(a) Contrast the Difference Between loc and iloc Attributes with Example
Understanding loc and iloc in Pandas
In Pandas, loc and iloc are two essential attributes for data selection. While they appear similar, they differ in how they access data from a DataFrame. loc is label-based, meaning it selects rows and columns using their explicit labels. On the other hand, iloc is integer-based, meaning it selects data by the index position. This difference is crucial when working with datasets that have non-numeric or custom indexing.
Examples Demonstrating the Difference
Consider the following code:
| SESSION | FEB-MARCH 2025 |
| PROGRAM | MASTER OF BUSINESS ADMINISTRATION (MBA) |
| SEMESTER | III |
| COURSE CODE & NAME | DADS302 EXPLORATORY DATA ANALYSIS |
Assignment Set – 1
Q1. Explain various measures of dispersion in detail using specific examples. 10
Ans 1.
Measures of Dispersion
Measures of dispersion refer to statistical techniques used to describe the spread or variability within a data set. While measures of central tendency like mean and median give a single value summary, dispersion measures how much data points differ from this central value. Understanding dispersion is essential in statistical analysis, particularly in evaluating risk, consistency, and variability of datasets.
Range as a Basic Measure
The range is the simplest measure of dispersion, calculated by subtracting the smallest value from the largest. For example, in a dataset of test scores: 45, 50, 60, 70, 80, the range is 80 –
Its Half solved only
Buy Complete assignment from us
Price – 190/ assignment
MUJ Manipal University Complete SolvedAssignments MARCH 2025
buy cheap assignment help online from us easily
we are here to help you with the best and cheap help
Contact No – 8791514139 (WhatsApp)
OR
Mail us- [email protected]
Our website – www.assignmentsupport.in
Q2. What is Data Science? Discuss the role of Data Science in various Domains.2 + 8
Ans 2.
Data Science is an interdisciplinary field that combines scientific methods, statistical analysis, algorithms, and machine learning techniques to extract meaningful insights from structured and unstructured data. It is often referred to as the future of decision-making, as it empowers organizations to make data-driven decisions rather than relying on intuition. At its core, Data Science integrates knowledge from mathematics, statistics, computer science, and domain-specific expertise to uncover patterns and solve complex problems.
The lifecycle of a Data Science project typically includes data collection, data cleaning,
Q3. Discuss various techniques used for Data Visualization. 10
Ans 3.
Importance of Data Visualization
Data visualization is the graphical representation of information and data using visual elements like charts, graphs, and maps. It is a core component of exploratory data analysis, as it allows users to identify trends, patterns, and outliers quickly and intuitively. Visualization helps simplify complex datasets by transforming them into an easily understandable visual format. It is widely used in business intelligence, reporting dashboards, academic research,
Assignment Set – 2
Q4. What is feature selection? Discuss any two feature selection techniques used to get optimal feature combinations. 2+4+4
Ans 4.
Feature Selection
Feature selection is a critical step in the data preprocessing phase of machine learning and statistical modeling. It involves selecting a subset of the most relevant features (variables or predictors) from the original dataset that contribute the most to the prediction or classification outcome. The main goal is to improve model performance by eliminating irrelevant or redundant features, thereby reducing complexity, overfitting, and training time.
Feature selection differs from dimensionality reduction, as it selects existing features without transforming them. It enhances model interpretability by retaining the most significant
Q5. Discuss in detail the concept of Factor Analysis
Ans 5.
Factor Analysis
Factor analysis is a statistical technique used to identify underlying relationships among a large set of observed variables. The main goal is to reduce data complexity by grouping related variables into latent factors that represent common themes or constructs. These hidden factors help explain the patterns of correlations within the dataset. Factor analysis is often used in social sciences, psychology, market research, and behavioral studies where abstract
Q6. Differentiate between Principal Component Analysis and and Linear Discriminant Analysis 10
Ans 6.
Dimensionality Reduction Techniques
Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two widely used dimensionality reduction techniques in machine learning and statistics. Although they serve the common purpose of reducing the number of features in a dataset, they differ significantly in their objectives, methodology, and applications. Understanding these differences is crucial for selecting the appropriate technique for a given problem.
Dimensionality reduction is often necessary in datasets with high numbers of features, as it helps reduce noise, prevent overfitting, improve model performance, and simplify
| SESSION | FEB-MARCH 2025 |
| PROGRAM | MASTER OF BUSINESS ADMINISTRATION (MBA) |
| SEMESTER | III |
| COURSE CODE & NAME | DADS303 INTRODUCTION TO MACHINE LEARNING |
Assignment Set – 1
Q1. Discuss the relevance of Machine Learning in Business 10
Ans 1.
Machine Learning in Business Context
Machine Learning (ML) is a subset of artificial intelligence (AI) that enables systems to learn from data and improve their performance without being explicitly programmed. In the business environment, ML algorithms analyze historical and real-time data to uncover patterns, make predictions, and automate decision-making. With the explosive growth of data from digital platforms, businesses are increasingly adopting machine learning to gain competitive advantage, reduce operational costs, and enhance customer experience.
The relevance of machine learning in business stems from its ability to convert raw data into
Its Half solved only
Buy Complete assignment from us
Price – 190/ assignment
MUJ Manipal University Complete SolvedAssignments MARCH 2025
buy cheap assignment help online from us easily
we are here to help you with the best and cheap help
Contact No – 8791514139 (WhatsApp)
OR
Mail us- [email protected]
Our website – www.assignmentsupport.in
Q2. What do you mean by Regularization? Briefly discuss various methods to do Regularization in Regression. 10
Ans 2.
Concept of Regularization
Regularization is a crucial technique in machine learning that helps to prevent overfitting in predictive models, especially in regression tasks. Overfitting happens when a model learns both the true patterns and the noise in the training data, which leads to poor performance when applied to new, unseen data. Regularization improves a model’s generalization ability by adding a constraint to the learning process, effectively penalizing large or overly complex coefficients in the regression equation.
In simpler terms, regularization reduces the complexity of the model by discouraging it from
Q3. Briefly discuss Binary Logistic Regression.
Ans 3.
Binary Logistic Regression
Binary Logistic Regression is a type of classification algorithm used when the dependent variable has only two possible outcomes. Unlike linear regression, which predicts continuous values, binary logistic regression predicts the probability that a given input belongs to one of two distinct classes. These classes are typically coded as 0 and 1, such as “No” and “Yes”, “Failure” and “Success”, or “Negative” and “Positive”.
This method is widely used in fields like marketing, finance, healthcare, and social sciences to make decisions or predictions based on historical data. For example, it can be used to
Assignment Set – 2
Q4. Explain K-Means Clustering algorithm
Ans 4.
K-Means Clustering
K-Means is a popular unsupervised machine learning algorithm used for clustering tasks. Clustering involves grouping data points in such a way that points in the same group, or cluster, are more similar to each other than to those in other clusters. K-Means is widely used in market segmentation, pattern recognition, image compression, and customer behavior
Q5. Briefly explain ‘Splitting Criteria’, ‘Merging Criteria’ and ‘Stopping criteria’ in Decision Tree. 10
Ans 5.
Decision Trees
Decision trees are supervised learning models used for both classification and regression problems. They mimic human decision-making by breaking down data into branches based on specific conditions. The structure of a decision tree consists of root nodes, internal nodes for decision points, branches as outcomes, and leaf nodes representing final predictions. The quality and performance of a decision tree depend heavily on how it splits, merges, and
Q6. What is Support Vector Machine? What are the various steps in using Support Vector Machine? 10 ‘
Ans 6.
Support Vector Machine
Support Vector Machine (SVM) is a supervised machine learning algorithm used for both classification and regression tasks, though it is more commonly applied to classification problems. The main goal of SVM is to find the optimal boundary, known as a hyperplane, that best separates different classes in the dataset. SVM is effective in high-dimensional spaces and is especially useful when there is a clear margin of separation between classes. It is known for its robustness and ability to handle both linear and non-linear data through the
| SESSION | FEB-MARCH 2025 |
| PROGRAM | MASTER OF BUSINESS ADMINISTRATION (MBA) |
| SEMESTER | III |
| COURSE CODE & NAME | DADS304 VISUALIZATION |
Assignment Set – 1
Q1. Compare and contrast bar charts and line charts in terms of their use cases, strengths, and limitations. 10
Ans 1.
Understanding Bar Charts
Bar charts are one of the most commonly used data visualization techniques to represent categorical data. In a bar chart, data is displayed using rectangular bars where the length of the bar is proportional to the value it represents. Bar charts can be displayed vertically or horizontally and are effective for comparing quantities across discrete categories.
For instance, a bar chart can be used to compare sales figures for different product categories
Its Half solved only
Buy Complete assignment from us
Price – 190/ assignment
MUJ Manipal University Complete SolvedAssignments MARCH 2025
buy cheap assignment help online from us easily
we are here to help you with the best and cheap help
Contact No – 8791514139 (WhatsApp)
OR
Mail us- [email protected]
Our website – www.assignmentsupport.in
Q2. What are the basic statistical commands used in R? Explain with examples. 10
Ans 2.
Statistical Commands in R
R is a powerful statistical programming language designed specifically for data analysis, computation, and graphical representation. It comes with a rich set of built-in functions and packages that simplify the process of performing statistical analysis. Basic statistical commands in R help users understand and summarize data efficiently, making it easier to explore and visualize. These commands cover descriptive statistics, probability distributions, correlation, hypothesis testing, and more. Here we discuss some of the most essential
Q3. Define a dashboard in the context of data visualization. Discuss its importance with suitable example. 10
Ans 3.
Definition and Purpose of a Dashboard
A dashboard, in the context of data visualization, is a visual display that consolidates and organizes key metrics, data points, and performance indicators on a single screen. It is designed to provide users with an at-a-glance view of real-time data insights for monitoring, decision-making, and analysis. Dashboards transform raw data into actionable insights through graphs, charts, tables, and other visual elements, allowing users to track trends,
Assignment Set – 2
Q4. What are the benefits of grouping data in Tableau when analyzing large datasets? 10
Ans 4.
Understanding Grouping in Tableau
Grouping in Tableau refers to the process of combining multiple dimension values into a single category or group. It is a powerful data preparation and visualization feature that helps simplify analysis by categorizing large volumes of granular data into meaningful clusters. This is especially helpful when working with massive datasets that contain hundreds or thousands of unique values. Grouping makes it easier to analyze patterns, highlight
Q5. Define sorting in Tableau. What are the different ways you can sort data in a visualization? 3+7
Ans 5.
Definition and Purpose of Sorting in Tableau
Sorting in Tableau refers to the process of arranging data in a specific order to make it easier to analyze and interpret. Sorting can be done in ascending or descending order, based on various fields such as dimensions, measures, or calculated values. It enhances visual clarity by organizing data meaningfully and highlighting patterns, trends, or outliers that may not be obvious in unsorted views.
The primary goal of sorting is to improve readability and help users focus on the most significant values in the visualization. For instance, sorting products by highest sales or
Q6. Explain the concepts of merging and appending data sources in Power BI. 10
Ans 6.
Data Integration in Power BI
Power BI is a powerful business intelligence tool that allows users to connect, transform, and visualize data from various sources. Often, in real-world scenarios, data is not stored in a single table or database but is scattered across multiple files or systems. To perform comprehensive analysis, Power BI provides data integration techniques such as merging and appending, which help combine multiple datasets into a single unified view.
Merging and appending are essential for data modeling in Power BI, especially when dealing